center for information services and high ... - tu dresden
TRANSCRIPT
![Page 1: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/1.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Performance Analysis of Computer Systems
3. Nov. 2011
![Page 2: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/2.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Summary of Previous Lecture (1)
Remarks: Doherty (1970)
Performance is the degree to which a computing system meets expectations
of the persons involved in it.
Main objective: Get highest performance for a given cost
System:
An arbitrary collection of hardware, software, and firmware:
e.g. CPU, database, network of computers
Metric:
A criteria used to evaluate the performance of a system:
e.g. response time, throughput, FLOPS
Workload:
The overall sum of user requests to a system
e.g.: CPU workload: Collection of instructions to execute
![Page 3: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/3.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Summary of Previous Lecture (2)
Discussion of performance analysis examples and questions
– Selection of technique, metric, and workload
– Correctness of performance measurements
– Measurement and simulation design
The art of performance analysis
– Successful evaluation cannot be produced mechanically
– Evaluation requires detailed knowledge of the system to be modeled
![Page 4: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/4.jpg)
10 steps for systematic performance evaluation
1. State goals
2. List services and outcomes
3. Select metrics
4. List parameters that affect performance
5. Select factors to study
6. Select technique for evaluation
7. Select workload
8. Design experiments
9. Analyze and interpret data
10. Present results
Holger Brunst, Matthias Müller: Leistungsanalyse
![Page 5: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/5.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Summary of Previous Lecture: Questions
What does performance mean?
What are the main reasons to do a performance analysis?
What are the main tasks?
What’s a system in performance analysis terminology?
What do the terms metric and workload stand for?
What’s a performance parameter?
What’s a performance factor?
![Page 6: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/6.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Parallel Metrics
![Page 7: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/7.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Excursion on Speedup and Efficiency Metrics
Comparison of sequential and parallel algorithms
Speedup:
– n is the number of processors
– T1 is the execution time of the sequential algorithm
– Tn is the execution time of the parallel algorithm with n processors
Efficiency:
– Its value estimates how well-utilized p processors solve a given problem
– Usually between zero and one. Exception: Super linear speedup (later)
!
Sn
=T1
Tn
!
Ep =Sp
p
![Page 8: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/8.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Amdahl’s Law
Find the maximum expected improvement to an overall system when only part of the system is improved
Serial execution time = s+p
Parallel execution time = s+p/n
– Normalizing with respect to serial time (s+p) = 1 results in:
• Sn = 1/(s+p/n)
– Drops off rapidly as serial fraction increases
– Maximum speedup possible = 1/s, independent of n the number of processors!
Bad news: If an application has only 1% serial work (s = 0.01) then you will never see a speedup greater than 100. So, why do we build system with more than 100 processors?
What is wrong with this argument?
!
Sn =s+ p
s+p
n
![Page 9: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/9.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Scaled Speedup (Gustafson-Barsis’ Law)
Amdahl’s speedup equation assumes p is independent of n, in other words
the problem size remains the same
Gustafson-Barsis’ law states that any sufficiently large problem can be
efficiently parallelized
More realistic to assume “runtime” remains the same, NOT the problem size
If the problem size scales up, does the serial part also increase?
Parallel execution time = s+p
Serial execution time = s+np
– Normalizing with respect to parallel execution time results in:
– Ssn = n+(1-n) s = p(n-1) + 1 !
Ssn =s+ pn
s+ p
![Page 10: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/10.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Efficiency and Serial Fraction
Strong scalability vs. weak scalability
En = Sn/n, does not tell the whole story
– is it necessarily bad if efficiency drops as you increase n for a given
problem size?
s is supposed to be a constant
– this assumes work is load balanced
– no overhead for synchronizing the processors
Experimentally measure the serial fraction
– if s does not remain constant, what can we discern?
![Page 11: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/11.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Superlinear/Superunitary Speedup
Work in algorithm = Wreal+Wovhd
What is Wovhd?
Super-unitary speedup possible if total work done by n processors is strictly
less than that done by a single processor
Reasons for super-unitary speedup
– Memory and cache effects
– Dividing up resource management overheads
– Hiding latency for remote operations
– Randomized algorithms
In literature superlinear speedup is sometime also referred to us super-
unitary speedup which might be mathematically more correct
![Page 12: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/12.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
System under Test
![Page 13: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/13.jpg)
System under Test
Holger Brunst, Matthias Müller: Leistungsanalyse
Application
C++ Fortran C
MPI OpenMP
Hardware
Compiler
Runtime
Linux Windows OS
![Page 14: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/14.jpg)
Code Size of HPC Software relative to other Systems
Software Lines of Code Person
Years
Windows NT 3.1 ~4.500.000 900
Linux Kernel 2.6.0 ~5.200.000 1040
Lustre >500.000 100
Open MPI 1.3.3 ~525.000 105
Open 64 compiler 4.2.1 ~1.139.000 227
HPCC ~50.000 10
VampirServer+VampirClient ~300.000 60
VampirTrace ~80.000 16
Marmot ~65.000 13
![Page 15: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/15.jpg)
Compare different Compilers with SPEC OMPM2001
![Page 16: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/16.jpg)
Code Tuning: different compiler flags
![Page 17: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/17.jpg)
Result of disk performance tests
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60#nodes in each classDisk Speed [MB/s]Disktest on 622 Nodesavg: 45.94max: 66.30min: 2.10
![Page 18: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/18.jpg)
Result of one SPEC OMPM application
– Histogram of 320.equake runtime on dual CPU nodes
– Sharp distribution indicates a healthy execution environment
![Page 19: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/19.jpg)
Result of one SPEC OMPM application
– Histogram of 310.wupwise runtime on dual CPU nodes
– Shows huge variation in runtime
– Problem identified as BIOS bug
![Page 20: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/20.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Workload types, selection and characterization
![Page 21: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/21.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Types of Workloads
Test workload:
– Any workload used in performance studies
– Real or synthetic
Real workload:
– Observed on a system being used for normal operation
– Cannot be repeated
– May contain sensitive data
Synthetic workload:
– Should be representative for a real workload
– Often smaller in size
![Page 22: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/22.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Historical examples for test workloads
Addition instruction
Instruction mixes
Kernels
Synthetic programs
Application benchmarks
![Page 23: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/23.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Popular benchmarks: Eratosthenes sieve algorithm
Algorithm to find prime numbers
Kernel
Simple
An algorithm is always independent of a computer language or specific
implementation
No very representative of today's use of computers
![Page 24: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/24.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Popular benchmarks: Ackermann’s Function
Ackermann(n,m) := n+1 if m=0
Ackermann(m-1,1) if n=0
Ackermann(m-1, Ackermann(m,n-1))
Used to assess the efficiency of procedure calls
Ackermann(3,n) requires
(512*4**(n-1)-15*2**(n+3)+9*n+37)/3 calls and
a stack size 2**(n+3)-4
![Page 25: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/25.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Popular benchmarks: Whetstone
Used at British Central Computer Agency
11 modules
Representative f 949 ALGOL programs
Available in ALGOL, FORTRAN, PL/I and other programs
See Curnow and Wichmann (1975)
Results in KWHIPS (Kilo Whetstone Instructions Per Second)
Workloads characteristics:
– Floating point intensive
– Cache friendly
– No I/O
![Page 26: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/26.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Popular benchmarks: LINPACK
Developed by Jack Dongarra (1983) at ANL (now ICL, UTK)
Solves a dense system of linear equations
Algorithmic definition of the benchmark
Reference implementation available (HPL)
Makes have use of BLAS
One fixed dataset: 100x100
Used as the benchmark for the TOP500 list
Many vendors have its own hand-tuned implementation
![Page 27: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/27.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Popular benchmarks: Dhrystone
Developed in 1984 by Reinhold Weicker at Siemens
Represents systems programming environments
Available in C, Pascal and Ada
Results are in Dhrystone Instructions Per Seconds (DIPS)
Includes ground rules for building and executing Dhrystone (run rules)
![Page 28: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/28.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Popular Benchmarks: Lawrence Livermore Loops
24 separate tests
Largely vectorizable
Assembled at LLNL (see McMahon 1986)
![Page 29: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/29.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Popular Benchmarks: Transaction Processing (TPC-C)
Successor of the Debit-Credit Benchmark
TPC-C is an on-line transaction processing benchmark
Results reports performance (tpmC) and price/performance ($/tmpC)
System reported has to be available to the customer (at that price)
Running the benchmarks requires a costly setup:
![Page 30: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/30.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
SPEC groups and benchmarks
Open Systems Group (desktop systems, high-end workstations and servers)
– CPU (CPU benchmarks)
– JAVA (java client and server side benchmarks)
– MAIL (mail server benchmarks)
– SFS (file server benchmarks)
– WEB (web Server benchmarks)
High Performance Group (HPC systems)
– OMP (OpenMP benchmark)
– HPC (HPC application benchmark)
– MPI (MPI application benchmark)
Graphics Performance Groups (Graphics)
– Apc (Graphics application benchmarks)
– Opc (OpenGL performance benchmarks)
![Page 31: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/31.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Workload Selection
![Page 32: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/32.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
System under Study
Seems to be an easy thing to define
Be aware of different abstraction layers
Example ISO/OSI reference model for computer networks:
1. Application (mail, FTP)
2. Presentation (Data compression, ..)
3. Session (Dialogs)
4. Transport (Messages)
5. Network (Packets)
6. Datalink (Frames)
7. Physical (Bits)
![Page 33: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/33.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Level of Detail of the workload description
Examples:
– Most frequent request (e.g. Addition)
– Frequency of request type (instruction mix)
– Time-stamped sequence of requests
– Average resource demand (e.g. 20 I/O requests per second)
– Distribution of resource demands (not only the average, but also
probability distribution)
![Page 34: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/34.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Representativeness
After all benchmarks are not a merit of their own, they should represent real
workloads:
Different characteristics to consider:
– Arrival rate of requests
– Resource demands
– Resource usage profile (sequence and amounts of resources used by an
application)
To be representative a test workload has to follow the user behavior in a
timely fashion!!!
![Page 35: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/35.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH) Center for Information Services and High Performance Computing (ZIH)
SPEC Benchmarks
Vorlesung Leistungsanalyse
![Page 36: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/36.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Outline
What is SPEC?
Who is SPEC?
Some SPEC benchmarks:
– SPEC CPU
– SPEC HPC
– SPEC OMP
– SPEC MPI
Summary
![Page 37: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/37.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
What and who is SPEC?
![Page 38: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/38.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
What is SPEC?
The Standard Performance Evaluation Corporation (SPEC) is a non-profit
corporation formed to establish, maintain and endorse a standardized set of
relevant benchmarks that can be applied to the newest generation of high-
performance computers. SPEC develops suites of benchmarks and also
reviews and publishes submitted results from our member organizations and
other benchmark licensees.
For more details see http://www.spec.org
![Page 39: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/39.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
SPEC Members
SPEC Members:
3DLabs * Acer Inc. * Advanced Micro Devices * Apple Computer, Inc. * ATI Research * Azul Systems, Inc. * BEA Systems * Borland * Bull S.A. * CommuniGate Systems * Dell * EMC * Exanet * Fabric7 Systems, Inc. * Freescale Semiconductor, Inc. * Fujitsu Limited * Fujitsu Siemens * Hewlett-Packard * Hitachi Data Systems * Hitachi Ltd. * IBM * Intel * ION Computer Systems * JBoss * Microsoft * Mirapoint * NEC - Japan * Network Appliance * Novell * NVIDIA * Openwave Systems * Oracle * P.A. Semi * Panasas * PathScale * The Portland Group * S3 Graphics Co., Ltd. * SAP AG * SGI * Sun Microsystems * Super Micro Computer, Inc. * Sybase * Symantec Corporation * Unisys * Verisign * Zeus Technology *
SPEC Associates:
California Institute of Technology * Center for Scientific Computing (CSC) * Defence Science and Technology Organisation - Stirling * Dresden University of Technology * Duke University * JAIST * Kyushu University * Leibniz Rechenzentrum - Germany * National University of Singapore * New South Wales Department of Education and Training * Purdue University * Queen's University * Rightmark * Stanford University * Technical University of Darmstadt * Texas A&M University * Tsinghua University * University of Aizu - Japan * University of California - Berkeley * University of Central Florida * University of Illinois - NCSA * University of Maryland * University of Modena * University of Nebraska, Lincoln * University of New Mexico * University of Pavia * University of Stuttgart * University of Texas at Austin * University of Texas at El Paso * University of Tsukuba * University of Waterloo * VA Austin Automation Center *
![Page 40: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/40.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
SPEC groups
Open Systems Group (desktop systems, high-end workstations and servers)
– CPU (CPU benchmarks)
– JAVA (java client and server side benchmarks)
– MAIL (mail server benchmarks)
– SFS (file server benchmarks)
– WEB (web Server benchmarks)
High Performance Group (HPC systems)
– OMP (OpenMP benchmark)
– HPC (HPC application benchmark)
– MPI (MPI application benchmark)
Graphics Performance Groups (Graphics)
– Apc (Graphics application benchmarks)
– Opc (OpenGL performance benchmarks)
![Page 41: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/41.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
SPEC HPG = SPEC High-Performance Group
Founded in 1994
Mission: To establish, maintain, and endorse a suite of
benchmarks that are representative of real-world high-
performance computing applications.
SPEC/HPG includes members from both industry and academia.
Benchmark products:
– SPEC OMP (OMPM2001, OMPL2001)
– SPEC HPC2002 released at SC 2002
– SPEC MPI (under development)
![Page 42: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/42.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Currently active SPEC HPG Members
Fujitsu
HP
IBM
Intel
SGI
SUN
UNISYS
University of Purdue
Technische Universität Dresden
![Page 43: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/43.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
HPG (High Performance Group) Benchmark Suites
OMPL2001
Founding of SPEC HPG
HPC96
OMP2001
HPC2002
MPI2007
Jan 1994 1996 June 2001 June 2002 Jan 2003 2007
![Page 44: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/44.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Overview and Positioning
![Page 45: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/45.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Where is SPEC Relative to Other Benchmarks ? There are many metrics, each one has its purpose
Raw machine performance: Tflops
Microbenchmarks: Stream
Algorithmic benchmarks: Linpack
Compact Apps/Kernels: NAS benchmarks
Application Suites: SPEC
User-specific applications: Custom benchmarks
Computer Hardware
Applications
![Page 46: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/46.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Why do we need benchmarks?
Identify problems: measure machine properties
Time evolution: verify that we make progress
Coverage:
Help the vendors to have representative codes:
– Increase competition by transparency
– Drive future development (see SPEC CPU2000)
Relevance:
Help the customers to choose the right computer
![Page 47: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/47.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Comparison of different benchmark classes
coverage relevance Identify
problems
Time
evolution
Micro 0 0 ++ +
Algorithmic - 0 + ++
Kernels 0 0 + +
SPEC + + + +
Apps - ++ 0 0
![Page 48: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/48.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
SPEC CPU 2006 From John Henning’s talk at SPEC Workshop
June 2007, Dresden
![Page 49: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/49.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
SPEC CPU2006 History
Released August 2006
Replaces CPU2000 (retired February 2007)
5th CPU benchmark
– SPECmark (later called “CPU89”)
– SPEC92 (later called “CPU92”)
– CPU95
– CPU2000
– CPU2006
Note: these updates are required to stay representative
Question to the audience: What kind of application would you add?
![Page 50: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/50.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
CINT 2006
Benchmark L Application Area Brief Description 400.perlbench C Programming Language Derived from Perl V5.8.7. The workload includes SpamAssassin,
MHonArc (an email indexer), and specdiff (SPEC's tool that checks benchmark outputs).
401.bzip2 C Compression Julian Seward's bzip2 version 1.0.3, modified to do most work in memory, rather than doing I/O.
403.gcc C C-Compiler Based on gcc Version 3.2, generates code for Opteron. 429.mcf C Combinatorial Optim. Vehicle scheduling. Uses a network simplex algorithm (which is also
used in commercial products) to schedule public transport. 445.gobmk C Artificial Intelligence: Go Plays the game of Go, a simply described but deeply complex game. 456.hmmer C Search Gene Sequence Protein sequence analysis using profile hidden Markov models (profile
HMMs) 458.sjeng C AI: chess A highly-ranked chess program that also plays several chess variants.
462.libquantum C Physics Quantum Comp. Simulates a quantum computer, running Shor's polynomial-time factorization algorithm.
464.h264ref C Video Compression A reference implementation of H.264/AVC, encodes a videostream using 2 parameter sets. The H.264/AVC standard is expected to replace MPEG2
471.omnetpp C++ Discrete Event Simulation Uses the OMNet++ discrete event simulator to model a large Ethernet campus network.
473.astar C++ Path-finding Algorithms Pathfinding library for 2D maps, including the well known A* algorithm.
483.xalancbmk C++ XML Processing A modified version of Xalan-C++, which transforms XML documents to other document types.
![Page 51: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/51.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
CFP 2006 (part I)
Benchmark Lang. Application Area Brief Description 410.bwaves Fortran Fluid Dynamics Computes 3D transonic transient laminar viscous flow. 416.gamess Fortran Quantum Chemistry. Implements a wide range of quantum chemical computations. The SPEC
workload does self-consistent field calculations using the Restricted Hartree Fock method, Restricted open-shell Hartree-Fock, and Multi- Configuration Self-Consistent Field
433.milc C Physics/QCD A gauge field generating program for lattice gauge theory with dynamical quarks.
434.zeusmp Fortran Physics / CFD ZEUS-MP is a computational fluid dynamics code developed at the Laboratory for Computational Astrophysics (NCSA, University of Illinois at Urbana-Champaign) for the simulation of astrophysical phenomena.
435.gromacs C, Fortran Biochemistry Molecular dynamics, i.e. simulate Newtonian equations of motion for hundreds to millions of particles. The test case simulates protein Lysozyme in a solution.
436.cactusADM C,Fortran Physics / General Relativity Solves the Einstein evolution equations using a staggered-leapfrog numerical method
437.leslie3d Fortran Fluid Dynamics Computational Fluid Dynamics (CFD) using Large-Eddy Simulations with Linear-Eddy Model in 3D. Uses MacCormack Predictor-Corrector time integration
444.namd C++ Biology Molecular Dynamics Simulates biomolecular systems. Test case has 92,224 atoms of apolipoprotein A-I.
447.dealII C++ FE Analysis deal.II is a C++ library targeted at adaptive finite elements and error estimation. The testcase solves a Helmholtz-type equation with non- constant coefficients.
![Page 52: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/52.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
CFP 2006 (part II)
Benchmark Language Application Area Brief Description 450.soplex C++ Linear Programming, Solves a linear program using a simplex algorithm and
sparse linear algebra. Test Optimization cases include railroad planning and military airlift models.
453.povray C++ Image Ray-tracing Image rendering. The testcase is a 1280x1024 anti- aliased image of a landscape with some abstract objects with textures using a Perlin noise function.
454.calculix C, F Structural Mechanics Finite element code for 3D structural applications. Uses the SPOOLES solver library.
459.GemsFDTD F Electromagnetics Solves Maxwell equations in 3D using finite-difference time-domain (FDTD) method.
465.tonto Fortran Quantum Chemistry An open source quantum chemistry package, using an object-oriented design in Fortran 95. The test case places a constraint on a molecular Hartree-Fock wavefunction calculation to better match experimental X-ray diffraction data.
470.lbm C Fluid Dynamics Implements the "Lattice-Boltzmann Method" to simulate incompressible fluids in 3D
481.wrf C,F Weather Weather modeling from scales of meters to thousands of kilometers. The test case is from a 30km area over 2 days.
482.sphinx3 C Speech recognition A widely-known speech recognition system from Carnegie Mellon University
![Page 53: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/53.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Code growth
![Page 54: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/54.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Metrics
Speed
– SPECint_base2006 (Required Base result)
– SPECint2006 (Optional Peak result)
– SPECfp_base2006 (Required Base result)
– SPECfp2006 (Optional Peak result)
Throughput
– SPECint_rate_base2006 (Required Base result)
– SPECint_rate2006 (Optional Peak result)
– SPECfp_rate_base2006 (Required Base result)
– SPECfp_rate2006 (Optional Peak result)
![Page 55: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/55.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Speed Metric for Single Benchmark
For each benchmark in suite, compute ratio vs. time on a reference system
– A 1997 Sun system with 296 MHz UltraSPARC II
– Similar but not identical to CPU2000 ref machine
Example:
– 400.perlbench on a year 2006 iMac took 948 seconds
– On the reference system, took 9770 seconds
– SPECratio = 10.3 (9770/948)
– If your workload looks like perl, you might find that this modern iMac
runs around 10x faster than a state-of-the-1997-art workstation.
![Page 56: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/56.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Overall Speed Metric
To obtain the overall speed metrics: geometric mean of the individual
SPECratios
Why geometric mean?
Because this is the best answer to the question
“Without knowing how much time I will spend in text processing vs. network
mapping vs. compiling vs. video compression, please tell me about how
much faster this machine will be than the reference system.”
![Page 57: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/57.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Motivation for Throughput Metric
Differs from speed
Stove analogy:
– One big flame cooks one big pot with one hogshead in one hour
– 6 little flames cook 6 little pots, each holding one firkin, in 15 minutes
– Which is better?
Well, big flame does ~250 liters/hour; each little flame does only ~40 * 4 =
160 liters/hour
![Page 58: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/58.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Throughput vs. Speed
Big flame does ~250 liters/hour; each little flame does only ~40 * 4 = 160
liters/hour
Alternatives:
– If I only need to heat up an UNOPENED container holding 1 gallon of
soup, supper can be served most quickly if I put it on the big flame
– If I need to heat up one butt of soup (=2 hogsheads), and if I can open
the container, I'd be better off using many small flames
In IT business:
– Processing one image in Photoshop or Gimp vs.
– Rendering the next movie with thousands of pictures
![Page 59: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/59.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
CPU2006 Throughput Metric
Formula:
the number of copies run * reference time for the benchmark / elapsed time
in seconds
Example:
Sun Fire E25K runs 144 copies of 400.perlbench in1066 seconds:
144 * 9770 / 1066 = 1320
![Page 60: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/60.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Summary of Metrics
Two different kind of metrics
– speed (single application turnaround)
– rate (thoughput)
Run rules make the different between base and peak
– Base: conservative optimization, less freedom
– Peak: more aggressive optimization, more freedom
Tow benchmark sets SPECint and SPECfp
⇒ 23 = 8 different metrics
If you look at the single application results you get:
⇒ 2*2*(12+17)=116 different metics
![Page 61: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/61.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
Example for Run Rules
Base does not allow feedback directed optimization (still legal in peak)
An unlimited number of flags may be set in base,
– Why? Because flag counting is not worth arguing about.
– For example, is -fast:np27 one flag, two, or three? Prove it.
– What if it's -fast_np27 ?
– What it it’s –fast np27 or –fast –np27 ?
![Page 62: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/62.jpg)
Holger Brunst, Matthias Müller: Leistungsanalyse
SPEC CPU2000 Result
![Page 63: Center for Information Services and High ... - TU Dresden](https://reader033.vdocuments.us/reader033/viewer/2022042702/6265d20ccd2875226c31e23a/html5/thumbnails/63.jpg)
Nöthnitzer Straße 46
Raum 1026
Tel. +49 351 - 463 - 35048
Holger Brunst ([email protected])
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Thank You!