recap
DESCRIPTION
Recap. Technology trends Cost/performance. Measuring and Reporting Performance. What does it mean to say “computer X is faster than computer Y ”?. E.g. Machine A executes a program in 10s; Machine B executes the same program in 15s. Which is true: A is 50% faster than B? - PowerPoint PPT PresentationTRANSCRIPT
Recap
• Technology trends
• Cost/performance
Measuring and Reporting Performance
• What does it mean to say “computer X is faster than computer Y”?
E.g. Machine A executes a program in 10s; Machine B executesthe same program in 15s.
Which is true:1) A is 50% faster than B?2) A is 33% faster than B?
Performance• H&P’s definition: “X is n times faster than
Y” means
nX
Y Time Execution
Time Execution
• Performance is reciprocal of time:
nY
X ePerformanc
ePerformanc
Example
• Answer: 1) A is 50% faster than B
E.g. Machine A executes a program in 10s; Machine B executesthe same program in 15s.
Which is true:1) A is 50% faster than B?2) A is 33% faster than B?
5.110
15
Time Execution
Time Execution
nA
B
Performance
• Response time?
• Throughput?
Measuring Performance
• Focus on execution time of real programs
• Measuring execution time? Wall clock time (elapsed time) CPU time (excludes I/O and other processes)
o User CPU time
o System CPU timeiota:~$ time gcc -g tmpcnv.s -o tmpcnv
real 0m3.352suser 0m0.367ssys 0m0.468s
Choosing Programs to Measure Performance
• Real Programs– Compilers, text-processing, CAD tools, etc.
• Modified applications– Scripted or modified for portability
• Kernels– Attempt to extract key sections from real programs
(Livermore loops, Linpack)
• Toy Benchmarks– Short examples (e.g. Sieve of Eratosthenes)
• Synthetic Benchmarks– Whetstone, Dhrystone
Benchmarking
• H&P: car magazines are more scientific about reporting performance than many CS journals!
Benchmark Suites
• Collections of benchmarks– E.g. SPEC CPU2000 (INT and FP)
• 25 real FORTRAN/C/C++ programs, modified for portability
– Specific graphics benchmarks
Server Benchmarks
• SPEC also has server benchmarks– File server– Web server
• TPC: Transaction Processing Council– Various transaction processing benchmarks
Embedded Benchmarks
• Much less well developed– Tend to use Dhrystone!
• EEMBC– Recent development– 34 benchmarks (mainly kernels) in five
application areas
Summarising Performance Measurements
• Complex area– Weighted arithmetic mean– Geometric mean– Normalised results– …
1.6 Quantitative Principles
• Make the common case fast!– E.g. addition: focus on “normal” addition, not
overflow situations
• Amdahl’s Law– Quantifies improvements gained by focussing
on one aspect of a design
Amdahl’s Law
section enhanced of Speedup
enhanced Fraction
where
)1(
1tenhancemen with timeExecution
tenhancemen without timeExecutiontenhancemen without ePerformanc
tenhancemen withePerformancSpeedup
E
E
E
EE
S
F
SF
F
Example• We are considering an enhancement that is
10 times faster than the original, but is only used 40% of the time.
56.1
104.0
)4.01(
1
)1(
1Speedup
01 0.4
E
EE
EE
SF
F
SF
CPU Performance
• CPU time related to clock speed:– Period (e.g. 1ns)– Rate (e.g. 1GHz)
• Also interested in Cycles Per Instruction (CPI)
Three Equal Factors
• Clock rate (technology)
• CPI (architecture)
• Instruction count (architecture and compiler)
rateClock
CPIIC
timecycleClock CPIICTime CPU
Measuring IC & CPI
• Many modern processors include hardware counters for instructions and clock cycles
• Simulations can give even more detail– Time consuming, but can be very accurate
Another Principle: Locality
• Locality of Reference– “90/10 Rule”
• Also applies to data
• Two aspects:– Temporal locality– Spatial locality
Taking Advantage of Parallelism
• Key principle for improving performance
• Examples:– System level: parallel processing, disk arrays,
etc.– Processor level: pipelining– Digital design: caches, ALU adders, etc.
1.7 Putting It All Together: Performance & Price/Performance
• Measure performance and performance/cost for three categories– Desktop (SPEC INT and FP)– TP Servers (TPC-C)– Embedded Processors (EEMBC)
Desktop
• Integer:– Performance/cost tracks performance
• FP:– Not as closely related– Pentium 4 much better than Pentium III
• AMD Athlon very good value for money
Servers
• Twelve systems– Six top performers– Six best price-performance
• Multiprocessors– 3 P3’s – 280 P3’s
• Cost:– $131,000 – $15 million
Embedded Processors
• Difficult to assess– Benchmarks very new– Designs very application-specific– Power a major constraint– Cost difficult to quantify (are support chips
required?)
Embedded Processors
• Range:– 500MHz AMD K6 ($78) and IBM PowerPC
($94) used for network switches, etc.– 167MHz NEC VR 5432 ($25) popular in colour
laser printers– 180MHz NEC VR 4122 ($33) popular in PDAs
(low power)
1.8 Another View: Power Consumption and Efficiency
• Embedded processors from previous example: power ranged from 700mW to 9600mW
• Fig. 1.27: Performance/watt– NEC VR 4122 huge leader
1.9 Fallacies and Pitfalls
• Fallacy: Relative performance of two similar processors can be judged by clock rate or by a single benchmark– Factors such as pipeline structure and memory
system have major impact– E.g. Pentium III vs. Pentium 4 (Fig. 1.28)
1.7GHz P4 –vs– 1.0GHz P3
Fallacies and Pitfalls
• Fallacy: Benchmarks remain valid indefinitely– Optimisations change
– Perhaps deliberately!
– Even real programs are affected by changes in technology
– E.g. gcc: increasing percentage is “system time”
– SPEC has adapted considerably
Fallacies and Pitfalls
• Pitfall: Comparing hand-coded assembly and compiled high-level language performance– E.g. embedded processor benchmarks– Hand-coded is 5 – 87 times faster!