![Page 1: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/1.jpg)
Survey of Performance Evaluation Tools
Last modified: 10/18/05
![Page 2: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/2.jpg)
Summary
Given features of existing performance evaluation tools, want to:
Determine collectable performance metrics What is recorded, hardware counters, etc
Identify tool’s architectural components, e.g., Data and communication (protocol) managements Software capabilities: monitoring or profiling, visualizations, and
modeling Goals are:
Investigate how a component-based performance evaluation framework can be constructed by leveraging existing tools
Investigate scaling (scale up and out) of this framework to large-scale systems
Large scale: ranges from 1000 to 10000 nodes Analyze workload characterization on deployed platforms for real
applications and users
![Page 3: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/3.jpg)
Outline
Background Tools
Monitoring Profiling/Tracing
Workload Characterization (WLC) Techniques A proposal: performance evaluation frameworks
![Page 4: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/4.jpg)
Background
![Page 5: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/5.jpg)
What is Workload?
According to Cambridge dictionary, a workload is defined as:
“The amount of work to be done, especially by a particular person or machine in a period of time”
Given the realm of computer systems, a workload is can be loosely defined as:
A set of requests presented to a computer in a period of time.
Workload can be classified into: Synthetic workload: created for controlled testing Real workload: any observed requests during normal
operations
![Page 6: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/6.jpg)
What is WLC?
WLC plays a key role in all performance evaluation studies
WLC is a synthetic description of a workload by means of quantitative parameters and functions
The objective is to formulate a model to show, capture, and reproduce the static and dynamic behavior of real workloads
WLC is a difficult as well as a neglected task A large amount of measurements are collected Extensive analysis has to be performed
![Page 7: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/7.jpg)
Performance Modeling
Analyze RequirementsPredict Requirements
Performance Tuning
Optimize Application ResponsivenessPredict Impact of Changes
Performance Analysis
Analyze PerformanceOptimize Resource UsagePredict RequirementsPredict Application Responsiveness Workload
Analysis•Analyze Performance•Profile Application
WLC in Performance Evaluation Life Cycle
Initial Sizing & Resizing
Evaluation
Production
Driving Forces: Competitions
Hardware
Software
Growth
On-goingOperation
Performance Reporting
Report PerformanceReport Resource Usage
![Page 8: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/8.jpg)
WLC in Performance Evaluation Methodology
Mathematics Models
![Page 9: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/9.jpg)
Workloads Data FlowsExperimentalenvironment
Realsystem
exec-driven
simulation
Trace-driven
simulation
Stochasticsimulation
Real applications
Benchmark applications
Micro-benchmarkprograms
Syntheticbenchmarkprograms
Traces
Distributions& other
statistics
Monitor(or Profiler)
Analysis
Generator Synthetictraces
Made-up data
Datasets
© 2003, Carla Ellis
![Page 10: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/10.jpg)
Workload Issues
Selection of benchmarks Requirements:
Repeatability Availability (software) Acceptance (by community) Representative (of typical usage, e.g. timeliness) Realistic (predictive of real performance, e.g. scaling issue)
Types of workloads: Real, Synthetic
Workloads monitoring & tracing Monitor/Profiler design Compression, simulation
Workload characterization Workload generators
![Page 11: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/11.jpg)
Types: Real and Synthetic Workloads
Real workloads: Advantages:
Represent reality “Deployment experience”
Disadvantage is they’re uncontrolled Can’t be repeated and described simply Difficult to analyze
Nevertheless, often useful for “final analysis” papers Synthetic workloads:
Advantages: Controllable Repeatable Portable to other systems Easily modified
Disadvantage: can never be sure real world will be the same (i.e., are they
representative?)
![Page 12: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/12.jpg)
Types: Instruction Workloads
Useful only for CPU performance But teach useful lessons for other situations
Development over decades “Typical” instruction (ADD) Instruction mix (by frequency of use)
Sensitive to compiler, application, architecture Still used today (MFLOPS)
Modern complexity makes mixes invalid Pipelining Data/instruction caching Prefetching
Kernel is inner loop that does useful work: Sieve, matrix inversion, sort, etc. Ignores setup, I/O, so can be timed by analysis if desired (at least
in theory)
![Page 13: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/13.jpg)
Real Applications
Standard Pick a representative application Pick sample data Run it on system to be tested Easy to do, accurate for that sample data Fails to consider other applications, data
Microkernel Choose most important subset of functions Write benchmark to test those functions Tests what computer will be used for Need to be sure important characteristics aren’t missed
![Page 14: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/14.jpg)
Synthetic Applications
Complete programs: Designed specifically for measurement May do real or “fake” work May be adjustable (parameterized)
Two major classes: Synthetic benchmarks: often need to compare general-
purpose computer systems for general-purpose use Examples: Sieve, Ackermann’s function, Whetstone,
Linpack, Dhrystone, Livermore loops, SPEC, MAB Microbenchmarks: for I/O, network, non-CPU
measurements Examples: HPCtoolkits
![Page 15: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/15.jpg)
Workload Considerations
Services exercised Level of detail Representative Timeliness Other considerations
![Page 16: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/16.jpg)
Services Exercised
What services does a system actually use? Faster CPU won’t speed up “cp” Network performance useless for matrix work
What metrics measure these services? MIPS for CPU speed Bandwidth for network, I/O TPS for transaction processing
May be possible to isolate interfaces to just one component
E.g., instruction mix for CPU System often has layers of services
Consider services provided and used by that component Can cut at any point and insert workload
![Page 17: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/17.jpg)
Integrity
Computer systems are complex Effect of interactions hard to predict So must be sure to test entire system
Important to understand balance between components
I.e., don’t use 90% CPU mix to evaluate I/O-bound application
Sometimes only individual components are compared
Would a new CPU speed up our system? How does IPV6 affect Web server performance?
But component may not be directly related to performance
![Page 18: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/18.jpg)
Workload Characterization
Identify service provided by major subsystem List factors affecting performance List metrics that quantify demands and
performance Identify workload provided to that service
![Page 19: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/19.jpg)
Example: Web Service
Web Client AnalysisServices: visit page, follow hyperlink, display informationFactors: page size, number of links, fonts required, embedded graphics, soundMetrics: response timeWorkload: a list of pages to be visited and links to be followed
Network AnalysisServices: connect to server, transmit request, transfer dataFactors: bandwidth, latency, protocol usedMetrics: connection setup time, response latency, achieved bandwidthWorkload: a series of connections to one or more servers, with data transfer
Web Server AnalysisServices: accept and validate connection, fetch HTTP dataFactors: Network performance, CPU speed, system load, disk subsystem performanceMetrics: response time, connections servedWorkload: a stream of incoming HTTP connections and requests
File System AnalysisServices: open file, read file (writing doesn’t matter for Web server)Factors: disk drive characteristics, file system software, cache size, partition sizeMetrics: response time, transfer rateWorkload: a series of file-transfer requests
Disk Drive AnalysisServices: read sector, write sectorFactors: seek time, transfer rateMetrics: response timeWorkload: a statistically-generated stream of read/write requests
Web Client
Network
TCP/IP Connections
Web Server
HTTP Requests
File System
Web Page Accesses
Disk Drive
Disk Transfers
Web Page Visits
![Page 20: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/20.jpg)
Level of Detail
Detail trades off accuracy vs. cost Highest detail is complete trace Lowest detail is one request (most common) Intermediate approach: weight by frequency
![Page 21: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/21.jpg)
Representative
Obviously, workload should represent desired application
Arrival rate of requests Resource demands of each request Resource usage profile of workload over time
Again, accuracy and cost trade off Need to understand whether detail matters
![Page 22: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/22.jpg)
Timeliness
Usage patterns change over time File size grows to match disk size Web pages grow to match network bandwidth
If using “old” workloads, must be sure user behavior hasn’t changed
Even worse, behavior may change after test, as result of installing new system
![Page 23: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/23.jpg)
Other Considerations
Loading levels Full capacity Beyond capacity Actual usage
External components not considered as parameters Repeatability of workload
![Page 24: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/24.jpg)
Tools
![Page 25: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/25.jpg)
Desire Features of a Measurement Tool
Basic usages of performance evaluation tools are: Performance analysis and enhancements of system operations Troubleshooting and recovery of operations of system components Support to the component performing
Job scheduling Resource management (e.g. when accomplishing load
balancing) Collection of information on applications Fault detection or prevention (HA)
Security threats and “holes” detection. Desirable features include, but not limited to,:
Non-intrusiveness Integration with batch job management systems System usage statistics retrieval Availability in cluster distributions Ability to scale to large system Graphic interface (standard GUI or web portal)
![Page 26: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/26.jpg)
Criterion for Comparing Tools
Evaluation criteria: Metrics collected Monitored/Profiled entities Visualization Data and Communication management
Other criteria: Knowledge representations Tools interoperability “Standard” APIs Scalability
![Page 27: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/27.jpg)
Some Terminology
Monitoring: A program that observes, supervises, or controls the
activities of other programs. Profiling:
A statistical view of how well resources are being used by a program, often in the form of a graph or table, representing distinctive features or characteristics.
Tracing: A graphic record of (system or application) events that is
recorded by a program.
![Page 28: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/28.jpg)
Monitoring
System monitoring Provide a continuous collection and aggregation of system
performance data.
Application monitoring Measure actual application performance via a batch
system.PerMiner perfminer.pdc.kth.se/
NWPerf
PerMiner perfminer.pdc.kth.se/
NWPerf
SuperMon supermon.sourceforge.net/
Hawkeye www.cs.wisc.edu/condor/hawkeye/
Ganglia ganglia.sourceforge.net/
CluMon clumon.ncsa.uiuc.edu/
![Page 29: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/29.jpg)
Profiling and Tracing
Profiling and Tracing Provide a static, instrumentation tool, which focuses on
source code that users have direct control.
TAU www.cs.uoregon.edu/research/tau/home.php
Paradyn www.paradyn.org/
MPE/Jumpshot www-unix.mcs.anl.gov/perfvis/
Dimemes/Paraver
mpiP www.llnl.gov/CASC/mpip/
DynoProf icl.cs.utk.edu/~mucci/dynaprof/
KOJAK www.fz-juelich.de/zam/kojak/
ICT www.intel.com/cd/software/products/asmo-na/eng/cluster/index.htm
Pablo pablo.renci.org/Software/Pablo/pablo.htm
MPICL/Paragraph www.csm.ornl.gov/picl/, www.csar.uiuc.edu/software/paragraph/
CoPilot www.sgi.com/products/software/co-pilot/
IPM www.nersc.gov/projects/ipm/
PerfSuite perfsuite.ncsa.uiuc.edu/
![Page 30: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/30.jpg)
Data Management and Data Formant
Databases/Query Languages JDBC SQL
Data Formats HDF, involves the development and support of software and file
formats for scientific data management. The HDF software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. There are two HDF formats, the original HDF (4.x and previous releases) and HDF5, which is a completely new format and library.
NetCDF, the Network Common Data Form, provides an interface for array-oriented data access and a library that supports an implementation of the interface.
XDR XML, the Extensible Markup Language, provides a standard way
to define application specific data languages.
![Page 31: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/31.jpg)
Monitoring Tools
![Page 32: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/32.jpg)
CoPilot
Metrics collected Monitored entities Visualizations
![Page 33: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/33.jpg)
Hawkeye
Metrics collected Monitored entities Visualizations
![Page 34: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/34.jpg)
IPM
Metrics collected Monitored entities Visualizations
![Page 35: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/35.jpg)
PerfSuite
Metrics collected Monitored entities Visualizations
![Page 36: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/36.jpg)
NWPerf
Metrics collected Profiled entities Visualizations
![Page 37: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/37.jpg)
PerMiner
Metrics collected Profiled entities Visualizations
![Page 38: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/38.jpg)
SuperMon
Metrics collected Profiled entities Visualizations
![Page 39: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/39.jpg)
CluMon
Metrics collected Profiled entities Visualizations
![Page 40: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/40.jpg)
Profiling/Tracing Tools
![Page 41: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/41.jpg)
TAU
Metrics recorded Two modes: profile, trace
Profile mode Inclusive/exclusive time spent in functions Hardware counter information
PAPI/PCL: L1/2/3 cache reads/writes/misses, TLB misses, cycles, integer/floating point/load/store/stalls executed, wall clock time, virtual time
Other OS timers (gettimeofday, getrusage) MPI message size sent
Trace mode Same as profile (minus hardware counters?) Message send time, message receive time, message size,
message sender/recipient(?) Profiled entities
Functions (automatic & dynamic), loops + regions (manual instrumentation)
![Page 42: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/42.jpg)
TAU
Visualizations Profile mode
Text-based: pprof (example next slide), shows a summary of profile information
Graphical: racy (old), jracy a.k.a. paraprof Trace mode
No built-in visualizations Can export to CUBE (see KOJAK), Jumpshot (see MPE),
and Vampir format (see Intel Cluster Tools)
![Page 43: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/43.jpg)
TAU – pprof output
Reading Profile files in profile.*
NODE 0;CONTEXT 0;THREAD 0:---------------------------------------------------------------------------------------%Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call---------------------------------------------------------------------------------------100.0 0.207 20,011 1 2 20011689 main() (calls f1, f5) 75.0 1,001 15,009 1 2 15009904 f1() (sleeps 1 sec, calls f2, f4) 75.0 1,001 15,009 1 2 15009904 main() (calls f1, f5) => f1() (sleeps 1 sec, calls f2, f4) 50.0 4,003 10,007 2 2 5003524 f2() (sleeps 2 sec, calls f3) 45.0 4,001 9,005 1 1 9005230 f1() (sleeps 1 sec, calls f2, f4) => f4() (sleeps 4 sec, calls
f2) 45.0 4,001 9,005 1 1 9005230 f4() (sleeps 4 sec, calls f2) 30.0 6,003 6,003 2 0 3001710 f2() (sleeps 2 sec, calls f3) => f3() (sleeps 3 sec) 30.0 6,003 6,003 2 0 3001710 f3() (sleeps 3 sec) 25.0 2,001 5,003 1 1 5003546 f4() (sleeps 4 sec, calls f2) => f2() (sleeps 2 sec, calls f3) 25.0 2,001 5,003 1 1 5003502 f1() (sleeps 1 sec, calls f2, f4) => f2() (sleeps 2 sec, calls
f3) 25.0 5,001 5,001 1 0 5001578 f5() (sleeps 5 sec) 25.0 5,001 5,001 1 0 5001578 main() (calls f1, f5) => f5() (sleeps 5 sec)
![Page 44: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/44.jpg)
TAU – paraprof
![Page 45: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/45.jpg)
Paradyn
Metrics recorded Number of CPUs, number of active threads, CPU and
inclusive CPU time Function calls to and by Synchronization (# operations, wait time, inclusive wait
time) Overall communication (# messages, bytes sent and
received), collective communication (# messages, bytes sent and received), point-to-point communication (# messages, bytes sent and received)
I/O (# operations, wait time, inclusive wait time, total bytes) All metrics recorded as “time histograms” (fixed-size data
structure) Profiled entities
Functions only (but includes functions linked to in existing libraries)
![Page 46: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/46.jpg)
Paradyn
Visualizations Time histograms Tables Barcharts “Terrains” (3-D histograms)
![Page 47: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/47.jpg)
Paradyn
Time View Histrogram across multiple hosts
![Page 48: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/48.jpg)
Paradyn – table (current metric values)
Table (current metric values)
Bar chart (current or average metric values
![Page 49: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/49.jpg)
MPE/Jumpshot
Metrics collected MPI message send time, receive time, size, message
sender/recipient User-defined event entry & exit
Profiled entities All MPI functions Functions or regions via manual instrumentation and
custom events Visualization
Jumpshot: timeline view (space-time diagram overlaid on Gantt chart), histogram
![Page 50: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/50.jpg)
Jumpshot
Timeline View Histogram View
![Page 51: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/51.jpg)
Dimemas/Paraver
Metrics recorded (MPITrace) All MPI functions Hardware counters (2 from the
following two lists, uses PAPI) Counter 1
Cycles Issued instructions, loads,
stores, store conditionals Failed store conditionals Decoded branches Quadwords written back
from scache(?) Correctible scache data
array errors(?) Primary/secondary I-cache
misses Instructions mispredicted
from scache way prediction table(?)
External interventions (cache coherency?)
External invalidations (cache coherency?)
Graduated instructions Counter 2
Cycles Graduated instructions,
loads, stores, store conditionals, floating point instructions
TLB misses Mispredicted branches Primary/secondary data
cache miss rates Data mispredictions from
scache way prediction table(?)
External intervention/invalidation (cache coherency?)
Store/prefetch exclusive to clean/shared block
![Page 52: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/52.jpg)
Dimemas/Paraver
Profiled entities (MPITrace) All MPI functions (message start time, message end time,
message size, message recipient/sender) User regions/functions via manual instrumentation
Visualization Timeline display (like Jumpshot)
Shows Gantt chart and messages Also can overlay hardware counter information
Clicking on timeline brings up a text listing of events near where you clicked
1D/2D analysis modules
![Page 53: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/53.jpg)
Paraver – timeline
timeline (HW counter)
timeline (standard)
![Page 54: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/54.jpg)
Paraver – text module
![Page 55: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/55.jpg)
Paraver
1D analysis
2D analysis
![Page 56: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/56.jpg)
mpiP
Metrics collected Start time, end time, message size for each MPI call
Profiled entities MPI function calls + PMPI wrapper
Visualization Text-based output, with graphical browser that displays statistics in-
line with source Displayed information:
Overall time (%) for each MPI node Top 20 callsites for time (MPI%, App%, variance) Top 20 callsites for message size (MPI%, App%, variance) Min/max/average/MPI%/App% time spent at each call site Min/max/average/sum of message sizes at each call site
App time = wall clock time between MPI_Init and MPI_Finalize MPI time = all time consumed by MPI functions App% = % of metric in relation to overall app time MPI% = % of metric in relation to overall MPI time
![Page 57: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/57.jpg)
mpiP – graphical view
![Page 58: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/58.jpg)
Dynaprof
Metrics collected Wall clock time or PAPI metric for each profiled entity Collects inclusive, exclusive, and 1-level call tree %
information Profiled entities
Functions (dynamic instrumentation) Visualizations
Simple text-based Simple GUI (shows same info as text-based)
![Page 59: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/59.jpg)
Dynaprof – output
[leko@eta-1 dynaprof]$ wallclockrpt lu-1.wallclock.16143
Exclusive Profile.
Name Percent Total Calls
------------- ------- ----- -------
TOTAL 100 1.436e+11 1
unknown 100 1.436e+11 1
main 3.837e-06 5511 1
Inclusive Profile.
Name Percent Total SubCalls
------------- ------- ----- -------
TOTAL 100 1.436e+11 0
main 100 1.436e+11 5
1-Level Inclusive Call Tree.
Parent/-Child Percent Total Calls
------------- ------- ----- --------
TOTAL 100 1.436e+11 1
main 100 1.436e+11 1
- f_setarg.0 1.414e-05 2.03e+04 1
- f_setsig.1 1.324e-05 1.902e+04 1
- f_init.2 2.569e-05 3.691e+04 1
- atexit.3 7.042e-06 1.012e+04 1
- MAIN__.4 0 0 1
![Page 60: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/60.jpg)
KOJAK
Metrics collected MPI: message start time, receive time, size, message
sender/recipient Manual instrumentation: start and stop times 1 PAPI metric / run (only FLOPS and L1 data misses
visualized) Profiled entities
MPI calls (MPI wrapper library) Function calls (automatic instrumentation, only available
on a few platforms) Regions and function calls via manual instrumentation
Visualizations Can export traces to Vampir trace format (see ICT) Shows profile and analyzed data via CUBE (described on
next few slides)
![Page 61: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/61.jpg)
CUBE overview: simple description
Uses a 3-pane approach to display information Metric pane Module/calltree pane
Right-clicking brings up source code location Location pane (system tree)
Each item is displayed along with a color to indicate severity of condition
Severity can be expressed 4 ways Absolute (time) Percentage Relative percentage (changes module & location pane) Comparative percentage (differences between executions)
Despite documentation, interface is actually quite intuitive
![Page 62: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/62.jpg)
Intel Cluster Tools (ICT)
Metrics collected MPI functions: start time, end time, message size, message
sender/recipient User-defined events: counter, start & end times Code location for source-code correlation
Instrumented entities MPI functions via wrapper library User functions via binary instrumentation(?) User functions & regions via manual instrumentation
Visualizations Different types: timelines, statistics & counter info Described in next slides
![Page 63: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/63.jpg)
ICT visualizations – timelines & summaries
Summary Chart Display Allows the user to see how
much work is spent in MPI calls
Timeline Display Zoomable, scrollable
timeline representation of program execution
Fig. 2 Timeline DisplayFig. 1 Summary Chart
![Page 64: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/64.jpg)
ICT visualizations – histogram & counters
Summary Timeline Timeline/histogram
representation showing the number of processes in each activity per time bin
Counter Timeline Value over time representation
(behavior depends on counter definition in trace)
Fig. 3 Summary TImeline
Fig 4. Counter Timeline
![Page 65: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/65.jpg)
ICT visualizations – message stats & process profiles
Message Statistics Display Message data to/from each
process (count,length, rate, duration)
Process Profile Display Per process data regarding
activities
Fig. 5 Message Statistics
Fig. 6 Process Profile Display
![Page 66: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/66.jpg)
ICT visualizations – general stats & call tree
Statistics Display Various statistics regarding
activities in histogram, table, or text format
Call Tree Display
Fig. 7 Statistics Display
Fig. 8 Call Tree Display
![Page 67: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/67.jpg)
ICT visualizations – source & activity chart
Source View Source code correlation
with events in Timeline
Activity Chart Per Process histograms of
Application and MPI activity
Fig 9. Source View
Fig. 10 Activity Chart
![Page 68: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/68.jpg)
ICT visualizations – process timeline & activity chart
Process Timeline Activity timeline and
counter timeline for a single process
Process Activity Chart Same type of informartion
as Global Summary Chart Process Call Tree
Same type of information as Global Call Tree
Figure 11. Process Timeline
Figure 12. Process Activity Chart & Call Tree
![Page 69: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/69.jpg)
Pablo
Metrics collected Time inclusive/exclusive of a function Hardware counters via PAPI Summary metrics computed from timing info
Min/max/avg/stdev/count Profiled entities
Functions, function calls, and outer loops All selected via GUI
Visualizations Displays derived summary metrics color-coded and inline
with source code Shown on next slide
![Page 70: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/70.jpg)
SvPablo
![Page 71: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/71.jpg)
MPICL/Paragraph
Metrics collected MPI functions: start time, end time, message size, message
sender/recipient Manual instrumentation: start time, end time, “work” done
(up to user to pass this in) Profiled entities
MPI function calls via PMPI interface User functions/regions via manual instrumentation
Visualizations Many, separated into 4 categories: utilization,
communication, task, “other” Described in following slides
![Page 72: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/72.jpg)
ParaGraph visualizations
Utilization visualizations Display rough estimate of processor utilization Utilization broken down into 3 states:
Idle – When program is blocked waiting for a communication operation (or it has stopped execution)
Overhead – When a program is performing communication but is not blocked (time spent within MPI library)
Busy – if execution part of program other than communication “Busy” doesn’t necessarily mean useful work is being done since
it assumes (not communication) := busy Communication visualizations
Display different aspects of communication Frequency, volume, overall pattern, etc. “Distance” computed by setting topology in options menu
![Page 73: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/73.jpg)
ParaGraph visualizations
Task visualizations Display information about when processors start & stop
tasks Requires manually instrumented code to identify when
processors start/stop tasks Other visualizations
Miscellaneous things
![Page 74: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/74.jpg)
Utilization visualizations – utilization count
Displays # of processors in each state at a given moment in time
Busy shown on bottom, overhead in middle, idle on top
Displays utilization state of each processor as a function of time (gnatt chart)
![Page 75: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/75.jpg)
Utilization visualizations – Kiviat diagram
Shows our friend, the Kiviat diagram
Each spoke is a single processor
Dark green shows moving average, light green shows current high watermark Timing parameters for each
can be adjusted Metric shown can be
“busy” or “busy + overhead”
![Page 76: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/76.jpg)
Utilization visualizations – streak
Shows “streak” of state Similar to winning/losing
streaks of baseball teams Win = overhead or busy Loss = idle
Not sure how useful this is
![Page 77: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/77.jpg)
Utilization visualizations – utilization summary
Shows percentage of time spent in each utilization state up to current time
![Page 78: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/78.jpg)
Utilization visualizations – utilization meter
Shows percentage of processors in each utilization state at current time
![Page 79: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/79.jpg)
Utilization visualizations – concurrency profile
Shows histograms of # processors in a particular utilization state
Ex: Diagram shows Only 1 processor was busy
~5% of the time All 8 processors were busy
~90% of the time
![Page 80: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/80.jpg)
Communication visualizations – color code
Color code controls colors used on most communication visualizations
Can have color indicate message sizes, message distance, or message tag Distance computed by topology set in options menu
![Page 81: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/81.jpg)
Communication visualizations – communication traffic
Shows overall traffic at a given time Bandwidth used, or Number of messages in flight
Can show single node or aggregate of all nodes
![Page 82: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/82.jpg)
Communication visualizations – spacetime diagram
Shows standard space-time diagram for communication Messages sent from node to node at which times
![Page 83: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/83.jpg)
Communication visualizations – message queues
Shows data about message queue lengths Incoming/outgoing Number of bytes queued/number of messages queued
Colors mean different things Dark color shows current moving average Light color shows high watermark
![Page 84: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/84.jpg)
Communication visualizations – communication matrix
Shows which processors sent data to which other processors
![Page 85: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/85.jpg)
Communication visualizations – communication meter
Show percentage of communication used at the current time
Message count or bandwidth 100% = max # of messages / max
bandwidth used by the application at a specific time
![Page 86: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/86.jpg)
Communication visualizations – animation
Animates messages as they occur in trace file
Can overlay messages over topology Available topologies
Mesh Ring Hypercube User-specified
Can layout each node as you want Can store to a file and load later on
![Page 87: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/87.jpg)
Communication visualizations – node data
Shows detailed communication data
Can display Metrics
Which node Message tag Message distance Message length
For a single node, or aggregate for all nodes
![Page 88: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/88.jpg)
Task visualizations – task count
Shows number of processors that are executing a task at the current time
At end of run, changes to show summary of all tasks
![Page 89: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/89.jpg)
Task visualizations – task Gantt
Shows Gantt chart of which task each processor was working on at a given time
![Page 90: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/90.jpg)
Task visualizations – task speed
Similar to Gantt chart, but displays “speed” of each task Must record work done by task in instrumentation call (not
done for example shown above)
![Page 91: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/91.jpg)
Task visualizations – task status
Shows which tasks have started and finished at the current time
![Page 92: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/92.jpg)
Task visualizations – task summary
Shows % time spent on each task Also shows any overlap between tasks
![Page 93: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/93.jpg)
Task visualizations – task surface
Shows time spent on each task by each processor
Useful for seeing load imbalance on a task-by-task basis
![Page 94: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/94.jpg)
Task visualizations – task work
Displays work done by each processor
Shows rate and volume of work being done
Example doesn’t show anything because no work amounts recorded in trace being visualized
![Page 95: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/95.jpg)
Other visualizations – clock, coordinates
Clock Shows current time
Coordinate information Shows coordinates when you
click on any visualization
![Page 96: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/96.jpg)
Other visualizations – critical path
Highlights critical path in space-time diagram in red Longest serial path shown in red Depends on point-to-point communication (collective can screw it
up)
![Page 97: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/97.jpg)
Other visualizations – phase portrait
Shows relationship between processor utilization and communication usage
![Page 98: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/98.jpg)
Other visualizations – statistics
Gives overall statistics for run Data
% busy, overhead, idle time Total count and bandwidth of
messages Max, min, average
Message size Distance Transit time
Shows max of 16 processors at a time
![Page 99: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/99.jpg)
Other visualizations – processor status
Shows Processor status Which task each processor
is executing Communication (sends &
receives) Each processor is a
square in the grid (8-processor example shown)
![Page 100: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/100.jpg)
Other visualizations – trace events
Shows text output of all trace file events
![Page 101: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/101.jpg)
WLC Techniques
![Page 102: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/102.jpg)
Static vs. Dynamic
Static: Explore the intrinsic characteristics of the workload Correlation between workload parameters and distributions Techniques:
Clustering Principle component analysis Averaging Correlations
Dynamic: Explore the characteristics of the workload over time Predict the workload behavior in the future Techniques:
Markov chains User behavior graphs Regression methods
![Page 103: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/103.jpg)
Discussion
![Page 104: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/104.jpg)
Proposed WLC Framework1.
Requirements Analysis
3. Model Construction
4. Model Validation
Graphical Analysis
3. Investigate
Real-TimeAnalysis
Collect
UNIX/Linux/Windows XP
2. Measurements
Web AccessWeb AccessODBC: SQL M/S
AccessXMLXML
Apply Criterion
Representative
Execute Model
Workload Model
Calibrate Model
Data Mining
Analytical/StatisticalTools
Database
NO
Yes
Results
6. Visualize
AnalyzeWorkload
Characterization
PredictResponse Time Analysis
Predictive Analysis
5. Evaluation
input
High
Medium
Low
Node CPU Utilization by Priority
0
20
40
60
80
10 AM11 12 PM1 2 3
% Proc
![Page 105: Survey of Performance Evaluation Tools Last modified: 10/18/05](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649e485503460f94b3b69c/html5/thumbnails/105.jpg)
References
Network monitoring tools - http://www.caida.org/tools/
PacketBench – network traffic Rubicon - I/O The Tracefile Testbed
http://www.nacse.org/perfdb/index.html