cpe 442 introduction to computer architecture the role of performance
DESCRIPTION
CpE 442 Introduction to Computer Architecture The Role of Performance. Instructor: H. H. Ammar. Overview of Today’s Lecture: The Role of Performance. Review from Last Lecture Definition and Measures of Performance Summarizing Performance and Performance Pitfalls. - PowerPoint PPT PresentationTRANSCRIPT
CpE442 Lec2.1 Introduction to Computer Architectures
CpE 442Introduction to Computer
Architecture
The Role of Performance
Instructor: H. H. Ammar
CpE442 Lec2.2 Introduction to Computer Architectures
Overview of Today’s Lecture: The Role of Performance
° Review from Last Lecture
° Definition and Measures of Performance
° Summarizing Performance and Performance Pitfalls
CpE442 Lec2.3 Introduction to Computer Architectures
Review: What is "Computer Architecture"
° Co-ordination of levels of abstraction
I/O systemInstr. Set Proc.
CompilerOperating
System
Application
Digital Design
Circuit Design
° Under a set of rapidly changing Forces
Instruction Set Architecture
CpE442 Lec2.4 Introduction to Computer Architectures
Review: Levels of Representation
High Level Language Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
CpE442 Lec2.5 Introduction to Computer Architectures
Review: Levels of Organization
SPARCstation 20
SPARC Processor
Computer
Control
Datapath
Memory Devices
Input
Output
CpE442 Lec2.6 Introduction to Computer Architectures
Review: Summary from Last Lecture
° All computers consist of five components
• Processor: (1) datapath and (2) control
• (3) Memory
• (4) Input devices and (5) Output devices
° Not all “memory” are created equally
• Cache: fast (expensive) memory are placed closer to the processor
• Main memory: less expensive memory--we can have more
° Input and output (I/O) devices has the messiest organization
• Wide range of speed: graphics vs. keyboard
• Wide range of requirements: speed, standard, cost ... etc.
• Least amount of research (so far)
CpE442 Lec2.7 Introduction to Computer Architectures
Processor Performance
Year
P
e
r
f
o
r
m
a
n
c
e
0
20
40
60
80
100
120
1987 1988 1989 1990 1991 1992 1993
DEC AXP
3000
HP 9000/750
IBM
RS6000/540MIPS M2000MIPS M/120Sun-4/260
IBM Power 2/590
1.54X/yr
1.35X/yr
CpE442 Lec2.8 Introduction to Computer Architectures
Metrics of performance
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per monthOperations per second
CpE442 Lec2.9 Introduction to Computer Architectures
Relating Processor Metrics
° CPU execution time = CPU clock cycles/pgm X clock cycle time
° or CPU execution time = CPU clock cycles/pgm ÷ clock rate
° CPU clock cycles/pgm = Instructions/pgm X CPI the avg. clock cycles per instruction
° or CPI = CPU clock cycles/pgm ÷ Instructions/pgm
° CPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program measured
CpE442 Lec2.10 Introduction to Computer Architectures
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
instr. count CPI clock rate
Program
Compiler
Instr. Set Arch.
Organization
Technology
CpE442 Lec2.11 Introduction to Computer Architectures
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
instr count CPI clock rate
Program X (x)
Compiler X (x)
Instr. Set. X X
Organization X X
Technology X
CpE442 Lec2.12 Introduction to Computer Architectures
Organizational Trade-offs
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
Instruction Mix
Cycle Time
CPI
CpE442 Lec2.13 Introduction to Computer Architectures
CPI
CPU time = ClockCycleTime *CPI * Ii = 1
n
i i
CPI = CPI * F where F = I i = 1
n
i i i i
Instruction Count
"instruction frequency"
Invest Resources where time is Spent!
CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count
“Average cycles per instruction”
CpE442 Lec2.14 Introduction to Computer Architectures
Example
Typical Mix
Base Machine (Reg / Reg)
Op Freq(Fi) CPI(i) % Time
ALU 50% 1 .5 33%
Load 20% 2 .4 27%
Store 10% 2 .2 13%
Branch 20% 2 .4 27%
1.5
The CPI = 1.5 cycles per instructionAssignment 1: Turn in the solution of the following problems from the text book By Thursday September 4,Chapter 2, Exercises Section, problems number 2.1, 2.2, 2.3, 2.4, 2.10, 2.11, 2.12, 2.13, and 2.15
CpE442 Lec2.15 Introduction to Computer Architectures
Enhanced Machine (Reg / Reg)
Op Freq CPI(i) % Time
ALU 50% 1 .5 25%
Load 20% 3 .6 30%
Store 10% 3 .3 15%
Branch20% 3 .6 30%
2.0
Assume a program of 1 million instructions, Compare the performance of
Base Machine (B) with the above CPI, 1 GHZ clock, and
Enhanced Machine (E) with 1.333 GHZ and a one cycle increase for L/S
And branch instructions
CpE442 Lec2.16 Introduction to Computer Architectures
Perf. of machine X = 1 / exec. Time of prog on machine X
Perf. of E / Perf. of B = exec. Time of B / exec. Time of E = 1.5 * 1 / 2 * 0.75
= 1
Performance of B is similar to that of E,
No gain in performance
CpE442 Lec2.17 Introduction to Computer Architectures
Marketing Metrics
MIPS = Instruction Count / (Time * 10^6)
= Clock Rate / (CPI * 10^6)
•machines with different instruction sets ?
•programs with different instruction mixes ?
• dynamic frequency of instructions
• uncorrelated with performance
MFLOP/S= FP Operations / (Time * 10^6)
•machine dependent
•often not where time is spent
CpE442 Lec2.18 Introduction to Computer Architectures
Example showing why MIPS can failCompare performance with Compilers 1 and 2 for a given program on a given machine
Instruction Count in Billion forinstruction classes A B C
Compiler 1 5 1 1Compiler 2 10 1 1clock cycles 1 2 3
Clock cycles using compiler1 = 10 BillionClock cycles using compiler2 = 15 Billion
assuming 1GHZ clockCPU Time 1 = 10 secsCPU Time 2 = 15 secs
yet the MIPS rating isMIPS 1 = (instr. Count/cpu time in sec x 10^6) = 700MIPS 2 = 800
CpE442 Lec2.19 Introduction to Computer Architectures
Why Do Benchmarks?° How we evaluate differences
• Different systems
• Changes to a single system
° Provide a target
• Benchmarks should represent large class of important programs
• Improving benchmark performance should help many programs
° For better or worse, benchmarks shape a field
° Good ones accelerate progress
• good target for development
° Bad benchmarks hurt progress
• help real programs v. sell machines/papers?
• Inventions that help real programs don’t help benchmark
CpE442 Lec2.20 Introduction to Computer Architectures
Programs to Evaluate Processor Performance
° (Toy) Benchmarks
• 10-100 line
• e.g.,: sieve, puzzle, quicksort
° Synthetic Benchmarks
• attempt to match average frequencies of real workloads
• e.g., Whetstone, dhrystone
° Kernels
• Time critical excerpts Real programs
• e.g., gcc, spice
CpE442 Lec2.21 Introduction to Computer Architectures
Successful Benchmark: SPEC
° EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988: Sun, MIPS, HP, Apollo, DEC
° Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O
CpE442 Lec2.22 Introduction to Computer Architectures
SPEC first round
° First round 1989; 10 programs, single number to summarize performance
° One program: 99% of time in single line of code
° New front-end compiler could improve dramatically
Benchmark
SP
EC
Pe
rf
0
100
200
300
400
500
600
700
800
gcc
epresso
spice
doduc
nasa7
li
eqntott
matrix300
fpppp
tomcatv
CpE442 Lec2.23 Introduction to Computer Architectures
SPEC second round, SPEC95
• 8 integer benchmarks in C and 10 floating pt benchmarks in Fortran
CpE442 Lec2.24 Introduction to Computer Architectures
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = -------------------- = ---------------------
ExTime w/ E Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task
by a factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) X ExTime(without E)
Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E)
<= 1/(1-F) speed up is bounded by this factor
CpE442 Lec2.25 Introduction to Computer Architectures
Performance Evaluation Summary
° Time is the measure of computer performance!
° Good products created when have:
• Good benchmarks
• Good ways to summarize performance
° If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins
° Remember Amdahl’s Law: Speedup is limited by unimproved part of program
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle