performance of multiprocessing systems: benchmarks and performance counters miodrag bolic elg7187...

21
Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Upload: shona-mathews

Post on 18-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Performance of multiprocessing systems: Benchmarks and

performance countersMiodrag Bolic

ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Page 2: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Outline

• Benchmarks• Measurements and monitoring• Performance counters

Page 3: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Types of benchmarks [1]

• Synthetic benchmarks– small artificial programs containing a mixture of

statements which are selected such that they are representative for a large class of real applications.

• Kernel benchmarks– small but relevant parts of real applications which

typically capture a large portion of the execution time of real applications.

• Real application benchmarks

Page 4: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

4

Benchmarks: challenges

• Challenges in developing benchmarks– Testing a whole system: CPU, cache, main memory, compilers – Selecting a suitable sets of applications– How to make portable benchmarks

(ANSI C: How big is a long? How big is a pointer? Does this platform implement calloc? Is it little endian or big endian? )

• Fixed workload benchmarks - how fast was the workload completed; – EEMBC MPEG-x benchmark – time to process the entire video

• Throughput benchmarks -how many workload units per unit time were completed. – EEMBC MPEG-x benchmark – number of frames processed for the fixed amount of time

• The base metrics– same compiler flags must be used in the same order for all benchmarks..

• The peak metrics – different compiler options may be used on each benchmark.

Page 5: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Available Benchmarks [2]

• SPEC CPU (general purpose),• MediaBench (media)• BioPerf (bioinformatics) • PARSEC multi-threaded workloads on

multicore processors, • DaCapo to evaluate Java workloads, • STAMP to evaluate transactional memory

Page 6: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

SPEC• Each of the programs is executed three times on the computer system U

to be tested. For each of the programs Ai an average xecution time TU (Ai ) in seconds is determined by taking the median of the three execution times measured.

• For each program, the execution time TU (Ai ) determined in step (1) is normalized with respect to the reference computer R by dividing the execution time TR(Ai) on R by the execution time TU (Ai) on U. This yields an execution factor FU (Ai ) = TR(Ai )/TU (Ai )– R - Sun Ultra Enterprise 2 with a 296MHz UltraSparc II processor

• SPECint2006 is computed as the geometric mean of the execution factors of the 12 SPEC integer programs

• Geometric mean: – the comparison between two machines is independent of the choice of the reference

computer.– does not provide information about the actual execution time of the programs

Page 7: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Measurement [3]

• It is based on direct measurements of the system under study using a software or/and hardware monitor.

• Monitor performs three tasks: – data acquisition, – data analysis, – result output

• An event is a change in the system state. – Examples are process context switching, beginning of seek on a disk,

and arrival of a packet.

• A trace is a log of events – includes the time of the event, the type of event, etc

Page 8: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Activating a monitor [3]

• Tracing - event-driven monitor - When an event occurs, the monitor is activated to capture the data about the state of the system. This gives a complete trace of the executing program.

• Sampling -The monitor is activated by clock interrupts.

Page 9: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Performance counters• Time-based profiles - where your software spends

its time, • Hardware performance measurements - what the

processor is doing and how effectively the processor is being utilized.

• Hardware measurements also pinpoint particular reasons why the CPU is stalling rather than accomplishing useful work.

• http://perfsuite.ncsa.uiuc.edu/publications/LJ135/t1.html

Page 10: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Advantages [4]• The application and operating system remain largely unmodified, apart

from the addition of drivers in the operating system to enable access to the hardware performance counters.

• Not using a simulation of the application, operating system, or processor ensures that the accuracy of the collected event counts.

• Performance-monitoring hardware collects data on the fly as the application executes, allowing full-speed data collection and avoiding the slowness of simulation-based approaches.

• This approach can collect data for both the application and the operating system.

Page 11: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Performance monitoring [4]

• Performance events can be grouped into: – program characterization,– memory accesses, – pipeline stalls, – branch prediction,– resource utilization.

• Performance-monitoring hardware has two components: – performance event detectors – event counters.

Page 12: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

MIPS R10000 [5]

• User, Supervisor, Kernel, and/or Exception level mode. Any combination of count enable bits may be asserted.

• Event select• IP[7] interrupt enable

Page 13: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

MIPS R10000 [5]

Page 14: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Intel’s solution• Hardware performance counters are defined outside the

"architectural" register set, and they are not saved and restored on process context switches.

• The measurements are therefore attached to the processor, and not to a process or thread.

• It is possible to separate user code from system code according to the privilege level

• The Intel Pentium-series processors include a 64-bit cycle counter, and two 40-bit event counters, with a list of events and additional semantics that depend on the particular processor.

• The AMD Athlon processor has a 64-bit cycle counter, and four 48-bit event counters

Page 15: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Using performance counters [4]

• Scheduling– Single per-core metric (such as IPC or cache miss

rate) is not sufficient to categorize application behavior• Different thread types often have highly varying

characteristics. • Threads behave differently based on what thread was

scheduled beforehand

• Tuning memory access• Communication pattern

Page 16: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Problem with perf. Counters [6]

Page 17: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Advanced performance counters [6]

Page 18: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Software [4]• The Performance Application Programming

Interface (PAPI) tool – provides a common interface to performance-

monitoring hardware for many different processors, including Alpha, Athlon, Cray, Itanium, MIPS, Pentium, PowerPC, and UltraSparc.

– Initiate and reset counters, read them• Intel’s VTune Performance Analyzer – Supports all Intel Pentium and Itanium processors, – provides additional performance analysis tools such as

call graph profiling and processor-specific tuning advice.

Page 19: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Other approaches for collecting processor performance data [4]

• Software monitoring– Modify code to collect data– Need to have available source code and to be able

to rebuild the application.

• Simulators

Page 20: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

References1. Thomas Rauber, Gudula Runger, Parallel programming:For Multicore and

Cluster Systems, Springer, 2010 (Chapter 4).2. Lieven Eeckhout, Computer Architecture Performance Evaluation

Methods, Synthesis Lectures on Computer Architecture, June 2010.3. Lei Hu and Ian Gorton, Performance Evaluation for Parallel Systems: A

Survey, University of NSW, Australia, UNSW-CSE-TR-9707, October 1997.4. B. Sprunt, The Basics of Performance Monitoring Hardware, IEEE Micro,

July-August, page 64-71, 2002.5. MIPS Technologies, MIPS R10000 Microprocessor User’s Manual, Ver

2.0, 1996. http://techpubs.sgi.com/library/manuals/2000/007-2490-001/pdf/007-2490-001.pdf

6. V. Salapura et al, “Next Generation Performance Counters: Towards Monitoring over thousand concurent events,” IBM Research Report, RC24351, 2007

Page 21: Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip

Additional material covered in the lecture

1. Geometric mean computation [1]