computer architecture chapter 2 the role of performance
DESCRIPTION
Computer Architecture Chapter 2 The Role of Performance. Introduction. This chapter discusses how to measure, report, and summarize performance and describes the major factor that determine the performance of a computer - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/1.jpg)
Lec2.1
Computer Architecture
Chapter 2
The Role of Performance
![Page 2: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/2.jpg)
Lec2.2
° This chapter discusses how to measure, report, and summarize performance and describes the major factor that determine the performance of a computer
° A primary reason for examining performance is that hardware performance is often key to the effectiveness of an entire system of hardware and software
° Assessing the performance of such a system can be quite challenging
° Key to understanding underlying organizational motivation
Why is some hardware better than others for different programs?
What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)
How does the machine's instruction set affect performance?
° For example, to improve the performance of a software system, we may need to understand what factors in the hardware contribute to the overall performance and the relative importance of these factors
Introduction
![Page 3: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/3.jpg)
Lec2.3
Which of these airplanes has the best performance?
Airplane Passengers Range (mi) Speed (mph)
Boeing 737-100 101 630 598Boeing 747 470 4150 610BAD/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544
°How much faster is the Concorde compared to the 747?
°How much bigger is the 747 than the Douglas DC-8?
![Page 4: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/4.jpg)
Lec2.4
° Response Time (latency): the time between the start and completion of a task
— How long does it take for my job to run?
— How long does it take to execute a job?
— How long must I wait for the database query?
° Throughput: the total amount of work done in a given time
— How many jobs can the machine run at once?
— What is the average execution rate?
— How much work is getting done?
° If we upgrade a machine with a new processor what do we increase?
If we add a new machine to the lab what do we increase?
Computer Performance
![Page 5: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/5.jpg)
Lec2.5
Two notions of “performance”
Plane
Boeing 747
BAD/Sud Concorde
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
Which has higher performance?
![Page 6: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/6.jpg)
Lec2.6
Example
• Time of Concorde vs. Boeing 747?
• Concord is 1350 mph / 610 mph = 2.2 times faster
= 6.5 hours / 3 hours
• Throughput of Concorde vs. Boeing 747 ?
• Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster”
• Boeing is 286,700 pmph / 178,200 pmph = 1.60 “times faster”
• Boeing is 1.6 times (“60%”) faster in terms of throughput
• Concord is 2.2 times (“120%”) faster in terms of flying time
![Page 7: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/7.jpg)
Lec2.7
° Elapsed Time• counts everything (disk and memory accesses, I/O, etc.)
• a useful number, but often not good for comparison purposes
° CPU time• doesn't count I/O or time spent running other programs
• can be broken up into system time, and user time
° Our focus: user CPU time • time spent executing the lines of code that are "in" our program
Execution Time
![Page 8: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/8.jpg)
Lec2.8
° For some program running on machine X,
PerformanceX = 1 / Execution timeX
° "X is n times faster than Y"
PerformanceX / PerformanceY = n
° Problem:• machine A runs a program in 20 seconds
• machine B runs the same program in 25 seconds
Book's Definition of Performance
![Page 9: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/9.jpg)
Lec2.9
Clock Cycles
° Almost all computers are constructed using a clock that runs at a constant rate and determines when events take place in the hardware
° These discrete time intervals are called clock cycles (or clock ticks, clock periods, clocks, cycles)
° Instead of reporting execution time in seconds, we often use cycles
° cycle time = time between ticks = seconds per cycle
° clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)
A 200 Mhz. clock has a cycle time
time
1
200 106 109 5 nanoseconds
![Page 10: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/10.jpg)
Lec2.10
° A given program will require
• some number of instructions (machine instructions)
• some number of cycles
• some number of seconds
° We have a vocabulary that relates these quantities:
• cycle time (seconds per cycle)
• clock rate (cycles per second)
• CPI (cycles per instruction)
a floating point intensive application might have a higher CPI
• MIPS (millions of instructions per second)
this would be higher for a program using simple instructions
Now that we understand cycles
![Page 11: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/11.jpg)
Lec2.11
CPI
CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count
“Average cycles per instruction”
CPU clock cycles = CPI * Ci i
![Page 12: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/12.jpg)
Lec2.12
° Suppose we have two implementations of the same instruction set architecture (ISA).
For some program,
Machine A has a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?
° If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?
CPI Example
![Page 13: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/13.jpg)
Lec2.13
° A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively).
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of CThe second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much?What is the CPI for each sequence?
# of Instructions Example
![Page 14: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/14.jpg)
Lec2.14
° MIPS = Instruction count/(Execution time * 10 )
° Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software.
The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.
The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.
° Which sequence will be faster according to MIPS?
° Which sequence will be faster according to execution time?
MIPS example
6
![Page 15: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/15.jpg)
Lec2.15
Metrics of performance
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per month
Useful Operations per second
![Page 16: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/16.jpg)
Lec2.16
° Performance best determined by running a real application• Use programs typical of expected workload
• Or, typical of expected class of applicationse.g., compilers/editors, scientific applications, graphics, etc.
° Small benchmarks• nice for architects and designers
• easy to standardize
• can be abused
° SPEC (System Performance Evaluation Cooperative)• companies have agreed on a set of real program and inputs
• can still be abused (Intel’s “other” bug)
• valuable indicator of performance (and compiler technology)
Benchmarks
![Page 17: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/17.jpg)
Lec2.17
SPEC ‘89
° Compiler “enhancements” and performance
0
100
200
300
400
500
600
700
800
tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc
BenchmarkCompiler
Enhanced compiler
SP
EC
pe
rfo
rman
ce r
atio
![Page 18: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/18.jpg)
Lec2.18
SPEC95
° Eighteen application benchmarks (with inputs) reflecting a technical computing workload
° Eight integer
• go, m88ksim, gcc, compress, li, ijpeg, perl, vortex
° Ten floating-point intensive
• tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5
![Page 19: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/19.jpg)
Lec2.19
SPEC ‘95
Benchmark Description
go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database program
tomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation
![Page 20: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/20.jpg)
Lec2.20
SPEC ‘95
Does doubling the clock rate double the performance?
Can a machine with a slower clock rate have better performance?
Clock rate (MHz)
SP
EC
int
2
0
4
6
8
3
1
5
7
9
10
200 25015010050
Pentium
Pentium Pro
PentiumClock rate (MHz)
SP
EC
fp
Pentium Pro
2
0
4
6
8
3
1
5
7
9
10
200 25015010050
![Page 21: Computer Architecture Chapter 2 The Role of Performance](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814646550346895db35453/html5/thumbnails/21.jpg)
Lec2.21
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )
° Example:
"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"
How about making it 5 times faster?
Amdahl's Law