performance

38
Datorteknik PerformanceAnalyse bild 1 Performance Performance what is it: measures of performance The CPU Performance Equation: Execution time as the measure what affects execution time examples Choosing good benchmarks? choosing bad benchmarks? Amdahl's Law

Upload: heller

Post on 06-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Performance. Performance what is it: measures of performance The CPU Performance Equation: Execution time as the measure what affects execution time examples Choosing good benchmarks? choosing bad benchmarks? Amdahl's Law. Performance is Time. Time to do the task (Execution Time) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Performance

Datorteknik PerformanceAnalyse bild 1

Performance Performance

– what is it: measures of performance

The CPU Performance Equation:– Execution time as the measure

– what affects execution time

– examples

Choosing good benchmarks?– choosing bad benchmarks?

Amdahl's Law

Page 2: Performance

Datorteknik PerformanceAnalyse bild 2

Performance is Time

Time to do the task (Execution Time)– execution time, response time, latency

Tasks per unit time (sec, minute, ...)– throughput, bandwidth

Page 3: Performance

Datorteknik PerformanceAnalyse bild 3

Performance as Response Time

Performance is most often measured as response time or execution time for some task.

“X is n times faster than Y” means

Performance(X) Execution Time(Y)

–––––––––––––– = –––––––––––––––– = n

Performance(Y) Execution Time(X)

ExampleExecution time of program P

X is 5 sec; Y is 10 sec.

X is 2 times faster than Y.

Page 4: Performance

Datorteknik PerformanceAnalyse bild 4

What time to measure? Elapsed time, wall-clock time:

– actual time from start to completion

– depends on CPU, system, I/O, etc.

– often used in real benchmarks

– only suitable choice when I/O is included

CPU Time:– measure/analyze CPU performance only

– may be suitable when machine is timeshared

– possibly both user and system component

– User CPU time is our focus for first part of course

Elapsed time = CPU time + Idle time– usually and assuming time is accurately accounted for

Page 5: Performance

Datorteknik PerformanceAnalyse bild 5

Metrics of performance Different performance metrics are appropriate at

different levels:

Compiler

LanguageProgramming

Application

DatapathControl

Function UnitsTransistors

ISA

Frames per secondOperations per second

(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/s

Cycles per second (clock rate)

Cycles per Instruction

Page 6: Performance

Datorteknik PerformanceAnalyse bild 6

Relating Processor Metrics CPU execution time per program

= CPU clock cycles/program X Clock cycle time

= CPU clock cycles/program ÷ Clock rate (frequency)

CPU clock cycles/program= Instructions/program X Clock cycles Per Instruction

Clock cycles Per Instruction (CPI) is an average measurement, it depends on :

– ISA, the implementation, and the program measured

– CPI = CPU clock cycles/program ÷ Instructions/program

– Also, Instructions per clock cycle or IPC = 1 / CPI

CPU execution time = Instructions X CPI X Clock cycle

Page 7: Performance

Datorteknik PerformanceAnalyse bild 7

Let’s look at the single-cycle model analytically

Page 8: Performance

Datorteknik PerformanceAnalyse bild 8

Static timing analysis

Memories 10 ns Register 5 ns Adders 10 ns ALU 10 ns

Use topological sort!

Page 9: Performance

Datorteknik PerformanceAnalyse bild 9

5 ns Branch

logic

Sgn/Ze

extend

Zero ext.

lw $2 const($3)

10 ns10 ns

ALU

A

B

31

0

4+

+

10 ns

10 ns

10 ns

35 ns delay

Page 10: Performance

Datorteknik PerformanceAnalyse bild 10

But that path goes through the data memory!

What if this is not a load/store?

How about an instruction that does nothing?

“NOP”

Page 11: Performance

Datorteknik PerformanceAnalyse bild 11

5 ns Branch

logic

Sgn/Ze

extend

Zero ext.

Nop

10 ns10 ns

ALU

A

B

31

0

4+

+

10 ns

10 ns

10 ns

10 ns delay

Page 12: Performance

Datorteknik PerformanceAnalyse bild 12

5 ns Branch

logic

Sgn/Ze

extend

Zero ext.

Add $ra $rb $rc

10 ns10 ns

ALU

A

B

31

0

4+

+

10 ns

10 ns

10 ns

25 ns delay

Page 13: Performance

Datorteknik PerformanceAnalyse bild 13

5 ns Branch

logic

Sgn/Ze

extend

Zero ext.

B label

10 ns10 ns

ALU

A

B

31

0

4+

+

10 ns

10 ns

10 ns

20 ns delay

Page 14: Performance

Datorteknik PerformanceAnalyse bild 14

35 ns for load/store

but

10 ns for NOP !?

Page 15: Performance

Datorteknik PerformanceAnalyse bild 15

Amdahl’s Law:

“Make the common case fast”

Page 16: Performance

Datorteknik PerformanceAnalyse bild 16

Amdahl's Law Handy for evaluating impact of a change not tied to

CPU performance equation Insight: No improvement of a feature enhances

performance by more than the use of the feature. Suppose that enhancement E accelerates fraction F

of a program by a factor S (remainder of the task is unaffected):

ExecTimeE = (1 – F(1 – 1/S)) X ExecTimewithout

F 1-F 1-F

E

S =

F/S

Page 17: Performance

Datorteknik PerformanceAnalyse bild 17

What if we don’t need the ALU?

A branch instruction?

Page 18: Performance

Datorteknik PerformanceAnalyse bild 18

BUT!

The single cycle model has to accomodate the slowest instruction

Even if it rarely occurs!

Page 19: Performance

Datorteknik PerformanceAnalyse bild 19

How much work can our structure perform?

For a program Q:

Time = Number of executed instruction *

Number of cycles per instruction *

Time per cycle

T = Nq * CPI * Tc

Page 20: Performance

Datorteknik PerformanceAnalyse bild 20

For the single cycle model....

CPI = 1 for all instructions

Tc determined by the slowest instruction

Page 21: Performance

Datorteknik PerformanceAnalyse bild 21

How to reduce T?

T = Nq * CPI * Tc

Reduce Nq.

More powerful instructions!

More hardware, longer paths, cycle time

goes up (slower machine)

Page 22: Performance

Datorteknik PerformanceAnalyse bild 22

“No free lunch”

Why designers are so well paid -

to optimize designs.

Page 23: Performance

Datorteknik PerformanceAnalyse bild 23

How to reduce T?

T = Nq * CPI * Tc

Faster hardware

Technological limits

Cost increase not linearly related

Sales volume drops

Page 24: Performance

Datorteknik PerformanceAnalyse bild 24

How to reduce T?

T = Nq * CPI * Tc

Make this a function of the instruction

For example: NOP = 1 cycle

LW = 4 cycles

Chapter 5.4, the classical method

Page 25: Performance

Datorteknik PerformanceAnalyse bild 25

How to reduce T?

T = Nq * CPI * Tc

Make this a function of the instruction

CPI goes up, but we can use an average,

not the worst case

Tc goes down, time to do the longes step,

not the entire instruction

Page 26: Performance

Datorteknik PerformanceAnalyse bild 26

Example

Branch: Step 1: fetch

Step 2: New PC

Add: Step 1: fetch

Step 2: decode/ register fetch

Step 3: Compute and write back

Page 27: Performance

Datorteknik PerformanceAnalyse bild 27

Example

LW = 4 steps

Cycletime = 1/4 old time

T = 4 * 1/4 old time,LW CPI

just as slow for the lw instruction

our worst case!

Page 28: Performance

Datorteknik PerformanceAnalyse bild 28

But that’s not important if LW is not common!

T = Nq * CPI * 1/4 old time

Averaged over this many instructions

1,3?1,7?Never = 4,0!

Page 29: Performance

Datorteknik PerformanceAnalyse bild 29

We win because of quantitative statisticalproperties of our programs!

Page 30: Performance

Datorteknik PerformanceAnalyse bild 30

What value of CPI do we use?

1,3? 1,5? 1,7?

Easy: Use average program!

?

Page 31: Performance

Datorteknik PerformanceAnalyse bild 31

There is no such thing!

Page 32: Performance

Datorteknik PerformanceAnalyse bild 32

Artificial “average programs” called “benchmarks”

Are they something to trust?

What about “peak performance values”

mips? mflops?

We have a peak at CPI = 1....

...a program of only NO-OPS!

Page 33: Performance

Datorteknik PerformanceAnalyse bild 33

Why Do Benchmarks? How we evaluate performance differences

– Across and within a single system (design & variations)

What should benchmarks do?– Represent a large class of important programs

– Behave like typical programs: improved benchmark performance => improved

performance broadly

For better or worse, benchmarks shape a field Good ones accelerate progress Bad benchmarks hurt progress

– help real programs vs. sell machines/papers?

– Enhancements that help benchmarks may not help most programs and v.v.

Page 34: Performance

Datorteknik PerformanceAnalyse bild 34

Classes of Benchmarks (Toy) Benchmarks

– 10-100 line–e.g.,: sieve, puzzle, quicksort

– good first programming assignments

Synthetic Benchmarks– attempt to match average frequencies of real workloads

– e.g., Whetstone, dhrystone

– mostly good for nothing: too artificial

Kernels– Time critical excerpts of real programs

– e.g., Livermore loops, Linpack

– good for micro-performance studies

Real programs– e.g., gcc, spice, Verilog, Database, stock trading

Page 35: Performance

Datorteknik PerformanceAnalyse bild 35

Successful Benchmark: SPEC Collection

1987 RISC industry (workstations) mired in “bench marketing”:

– (“That is an 8 MIPS machine, but they claim 10 MIPS!”)

EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988:

– Sun, MIPS, HP, Apollo, DEC

Create standard list of programs, inputs, reporting rules:

– several real programs, including OS calls

– some I/O

– rules for running and reporting

Page 36: Performance

Datorteknik PerformanceAnalyse bild 36

Multiple clock cycle designs:

State machines

Micro programming

chapter 5.4

“Computer Organization & Design”

Page 37: Performance

Datorteknik PerformanceAnalyse bild 37

How to reduce T?

T = Nq * CPI * Tc

Reduce quotient cycles / instruction

reduce “cycles” multiple clock-

cycle design

Increase “instruction” execute more

than one instr.

per cycle!

Page 38: Performance

Datorteknik PerformanceAnalyse bild 38

More than one instruction per cycle?

Parallelism– Div/mult + floating point + integer

Superscalarity– Multiple issue etc.

Pipelining– Of general importance