recap

Recap

• Technology trends

• Cost/performance

Measuring and Reporting Performance

• What does it mean to say “computer X is faster than computer Y”?

E.g. Machine A executes a program in 10s; Machine B executesthe same program in 15s.

Which is true:1) A is 50% faster than B?2) A is 33% faster than B?

Performance• H&P’s definition: “X is n times faster than

Y” means

nX

Y Time Execution

Time Execution

• Performance is reciprocal of time:

nY

X ePerformanc

ePerformanc

Example

• Answer: 1) A is 50% faster than B

E.g. Machine A executes a program in 10s; Machine B executesthe same program in 15s.

Which is true:1) A is 50% faster than B?2) A is 33% faster than B?

5.110

15

Time Execution

Time Execution

nA

B

Performance

• Response time?

• Throughput?

Measuring Performance

• Focus on execution time of real programs

• Measuring execution time? Wall clock time (elapsed time) CPU time (excludes I/O and other processes)

o User CPU time

o System CPU timeiota:~$ time gcc -g tmpcnv.s -o tmpcnv

real 0m3.352suser 0m0.367ssys 0m0.468s

Choosing Programs to Measure Performance

• Real Programs– Compilers, text-processing, CAD tools, etc.

• Modified applications– Scripted or modified for portability

• Kernels– Attempt to extract key sections from real programs

(Livermore loops, Linpack)

• Toy Benchmarks– Short examples (e.g. Sieve of Eratosthenes)

• Synthetic Benchmarks– Whetstone, Dhrystone

Benchmarking

• H&P: car magazines are more scientific about reporting performance than many CS journals!

Benchmark Suites

• Collections of benchmarks– E.g. SPEC CPU2000 (INT and FP)

• 25 real FORTRAN/C/C++ programs, modified for portability

– Specific graphics benchmarks

Server Benchmarks

• SPEC also has server benchmarks– File server– Web server

• TPC: Transaction Processing Council– Various transaction processing benchmarks

Embedded Benchmarks

• Much less well developed– Tend to use Dhrystone!

• EEMBC– Recent development– 34 benchmarks (mainly kernels) in five

application areas

Summarising Performance Measurements

• Complex area– Weighted arithmetic mean– Geometric mean– Normalised results– …

1.6 Quantitative Principles

• Make the common case fast!– E.g. addition: focus on “normal” addition, not

overflow situations

• Amdahl’s Law– Quantifies improvements gained by focussing

on one aspect of a design

Amdahl’s Law

section enhanced of Speedup

enhanced Fraction

where

)1(

1tenhancemen with timeExecution

tenhancemen without timeExecutiontenhancemen without ePerformanc

tenhancemen withePerformancSpeedup

E

E

E

EE

S

F

SF

F

Example• We are considering an enhancement that is

10 times faster than the original, but is only used 40% of the time.

56.1

104.0

)4.01(

1

)1(

1Speedup

01 0.4

E

EE

EE

SF

F

SF

CPU Performance

• CPU time related to clock speed:– Period (e.g. 1ns)– Rate (e.g. 1GHz)

• Also interested in Cycles Per Instruction (CPI)

Three Equal Factors

• Clock rate (technology)

• CPI (architecture)

• Instruction count (architecture and compiler)

rateClock

CPIIC

timecycleClock CPIICTime CPU

Measuring IC & CPI

• Many modern processors include hardware counters for instructions and clock cycles

• Simulations can give even more detail– Time consuming, but can be very accurate

Another Principle: Locality

• Locality of Reference– “90/10 Rule”

• Also applies to data

• Two aspects:– Temporal locality– Spatial locality

Taking Advantage of Parallelism

• Key principle for improving performance

• Examples:– System level: parallel processing, disk arrays,

etc.– Processor level: pipelining– Digital design: caches, ALU adders, etc.

1.7 Putting It All Together: Performance & Price/Performance

• Measure performance and performance/cost for three categories– Desktop (SPEC INT and FP)– TP Servers (TPC-C)– Embedded Processors (EEMBC)

Desktop

• Integer:– Performance/cost tracks performance

• FP:– Not as closely related– Pentium 4 much better than Pentium III

• AMD Athlon very good value for money

Servers

• Twelve systems– Six top performers– Six best price-performance

• Multiprocessors– 3 P3’s – 280 P3’s

• Cost:– $131,000 – $15 million

Embedded Processors

• Difficult to assess– Benchmarks very new– Designs very application-specific– Power a major constraint– Cost difficult to quantify (are support chips

required?)

Embedded Processors

• Range:– 500MHz AMD K6 ($78) and IBM PowerPC

($94) used for network switches, etc.– 167MHz NEC VR 5432 ($25) popular in colour

laser printers– 180MHz NEC VR 4122 ($33) popular in PDAs

(low power)

1.8 Another View: Power Consumption and Efficiency

• Embedded processors from previous example: power ranged from 700mW to 9600mW

• Fig. 1.27: Performance/watt– NEC VR 4122 huge leader

1.9 Fallacies and Pitfalls

• Fallacy: Relative performance of two similar processors can be judged by clock rate or by a single benchmark– Factors such as pipeline structure and memory

system have major impact– E.g. Pentium III vs. Pentium 4 (Fig. 1.28)

1.7GHz P4 –vs– 1.0GHz P3

Fallacies and Pitfalls

• Fallacy: Benchmarks remain valid indefinitely– Optimisations change

– Perhaps deliberately!

– Even real programs are affected by changes in technology

– E.g. gcc: increasing percentage is “system time”

– SPEC has adapted considerably

Fallacies and Pitfalls

• Pitfall: Comparing hand-coded assembly and compiled high-level language performance– E.g. embedded processor benchmarks– Hand-coded is 5 – 87 times faster!

recap

Documents

s machine b executesthe

performanceresponse

performancecpu time

parallel processing

time gcc g tmpcnv

real programs livermore

schoosing programs

fp25 real fortrancc