chapter 4 assessing and understanding performance

31
Chapter 4 Assessing and Understanding Performance Bo Cheng

Upload: zuzela

Post on 12-Jan-2016

43 views

Category:

Documents


1 download

DESCRIPTION

Chapter 4 Assessing and Understanding Performance. Bo Cheng. Which One Is Good?. Depends on measures of performance Cruising speed Longest range Largest capacity. Measuring Performance. Elapsed Time, wall-clock time or response time Total time to complete a task - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 4 Assessing and Understanding Performance

Chapter 4Assessing and Understanding

Performance

Bo Cheng

Page 2: Chapter 4 Assessing and Understanding Performance

Which One Is Good?

Airplane Passenge

rs Range (m

i) Speed (mp

h)Boeing 737-100 101 630 598

Boeing 747 470 4150 610

BAC/Sud Concorde 132 4000 1350

Douglas DC-8-50 146 8720 544

Depends on measures of performance• Cruising speed• Longest range• Largest capacity

Page 3: Chapter 4 Assessing and Understanding Performance

Measuring Performance

Elapsed Time, wall-clock time or response time– Total time to complete a task

Including disk and memory accesses, I/O , etc.– a useful number, but often not good for comparison purposes

CPU (execution) time – Doesn't count I/O or time spent running other programs– can be broken up into system CPU time, and user CPU time

CPU time = user CPU time +system CPU time Our focus: user CPU time

– time spent executing the lines of code that are "in" our program

Page 4: Chapter 4 Assessing and Understanding Performance

CPU Performance Metrics

Response time: the time between the start and the completion of a task (in time units)

Throughput: the total amount of work done in a given time (in number of tasks per unit of time)

Page 5: Chapter 4 Assessing and Understanding Performance

Performance

Problem: Machine A runs a

program in 10 sec. Machine B runs the

same program in 15 sec.

How much faster is A than B ?

ntimeexecution

timeexecution

ePerformanc

ePerformanc

timeexecutionePerformanc

x

y

y

x

xx

_

_

_

1

5.110

15

A is 1.5 times faster than B

Page 6: Chapter 4 Assessing and Understanding Performance

Clock Rate Measurement

Name Example Measurement

Millisecond 1 msec (ms) 1.E-03

Microsecond 1 usec (us) 1.E-06

Nanosecond 1 nsec (ns) 1.E-09

Picosecond 1 psec (ps) 1.E-12

Femtosecond 1 fsec (fs) 1.E-15

10 nsec clock cycle => 100 MHz clock rate1 nsec clock cycle => 1 GHz clock rate500 psec clock cycle => 2 GHz clock rate200 psec clock cycle => 5 GHz clock rate

• Clock cycle: The time for one clock period running at a constant rate • Clock rate is given in Hz (=1/sec)

• clock_cycle_time = 1/clock_rate (in sec)

Page 7: Chapter 4 Assessing and Understanding Performance

MHz

One MHz represents one million cycles per second.

The speed of microprocessors, called the clock speed, is measured in megahertz. – For example, a microprocessor that runs at 200

MHz executes 200 million cycles per second.

One GHz represents 1 billion cycles per second.

http://www.webopedia.com/TERM/M/MHz.html

Page 8: Chapter 4 Assessing and Understanding Performance

CPU Time or CPU Execution Time

The actual time the CPU spends computing for a specific task

This time accounts for the time CPU is computing the given program, including operating system routines executed on the program’s behave, and it does not include the time waiting for I/O and running other programs.

Performance of processor/memory = 1 / CPU_time

Page 9: Chapter 4 Assessing and Understanding Performance

CPU Execution Time Formula

E = CPU Execution time for a program

N = Number of CPU clock cycles for a program

T = clock cycle Time

R = clock Rate

R

NTNE *

Page 10: Chapter 4 Assessing and Understanding Performance

Example

410

N

Computer A4 GHz

Job

10 seconds

Computer BX GHz

Job

6 seconds

R

N*2.16

R = 8 GHz

Page 11: Chapter 4 Assessing and Understanding Performance

Clock cycles Per Instruction (CPI)

CIN *N = Number of CPU clock cycles for a programI = total Instructions for a programC = CPI

• The average number of clock cycles per instruction for a program or program fragment

Page 12: Chapter 4 Assessing and Understanding Performance

The Big Picture

R

CI

R

NE

cycleClock

Seconds

nsInstructio

cyclesClock

ogram

nsInstructio

ogram

SecondsTime

TCITNER

NTNE

*

_*

_*

PrPr

***

*

• Instruction count depends on the architecture, but not on the exact implementation• Average CPI depends on design details and on the mix of types of instructions executed in an application

Page 13: Chapter 4 Assessing and Understanding Performance

Understanding Program Performance

  Instruction Count

CPIClock Rate

Algorithm XPossi

bly 

Programming Language

X X  

Compiler X X  ISA X X X

Page 14: Chapter 4 Assessing and Understanding Performance

Using Performance Equation

  Clock Cycle Time

CPI

Computer A

250 ps 2

Computer B

500 ps 1.2Which computer is faster for this program, and by how much?

2.1500

600

600500*2.1*

500250*2*

I

I

CPU

CPU

ePerformanc

ePerformanc

IICPU

IICPU

A

B

B

A

B

A

Page 15: Chapter 4 Assessing and Understanding Performance

Computing CPI

• Done by looking at the different types of instructions and using their individual cycle counts

)*(_1

n

iii CCPICycleClock

Ci: The count of the number of instructions of class i executedCPIi: The average number of cycles per instruction for that instruction class ln: is the number of instruction classes

Page 16: Chapter 4 Assessing and Understanding Performance

Example

 CPI for this

instruction class

A B C

CPI 1 2 3

CodeSequen

ce

CPI for this instruction

class

A B C

1 2 1 2

2 4 1 12

5

10

10)3*2()2*1()1*2(

1

1

CPI

CC

5.16

9

9)3*1()2*1()1*4(

2

2

CPI

CC

Page 17: Chapter 4 Assessing and Understanding Performance

Workload

A set of programs used for evaluating a computer or a system

Benchmarks: programs specifically chosen to measure performance.

SPEC 2000 benchmarks (12 integer, 14 floating-point programs).

Performance results given by benchmarks may not be correct if the system (or the compiler of the system) is optimized for the benchmarks

Page 18: Chapter 4 Assessing and Understanding Performance

Benchmark

Programs specifically chosen to measure performance Best determined by running a real application

– use programs typical of expected workload– e.g., compilers/editors, scientific applications, graphics...

Small benchmarks– nice for architects and designers

SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real program and inputs

Page 19: Chapter 4 Assessing and Understanding Performance

Simplest Approach

  Computer A

Computer B

Program 1 (sec)

1 10

Program 2 (sec)

1000 100

Total (sec) 1001 1101.9

110

1001

_

_

B

A

A

B

TimeExecution

TimeExecution

ePerformanc

ePerformanc

Page 20: Chapter 4 Assessing and Understanding Performance

Evaluating Performance

Different classes and applications of computer require different types of benchmarks

Desktop

CPU Performance

SPEC CPU benchmark to measure CPU performance and response time

focusing on a specific task: DVD playback or graphic performance of games

Server

depend on the nature of intended application

Throughput

requirements on response time to individual events: database query and web page request

SPECweb99

Embedded

Computing

EEMBC

Reproducibility: list everything another experimenter need to duplicate the results

Page 21: Chapter 4 Assessing and Understanding Performance

SPEC CPU2000 Benchmark

Page 22: Chapter 4 Assessing and Understanding Performance

SPEC: CINT2000 and CFP2000

Page 23: Chapter 4 Assessing and Understanding Performance

Relative Performance in Three Different Modes

Page 24: Chapter 4 Assessing and Understanding Performance

Relative Energy Efficiency Comparison

Page 25: Chapter 4 Assessing and Understanding Performance

Amdahl’s Law

2080

sec20

)80100(80

_

n

nafterET

Execution Time After Improvement = ( Execution Time Affected/ Amount of Improvement) + Execution Time Unaffected

Example:Suppose a program runs in 100 seconds on a machine, with multiply operation responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 5 times faster?"

Principle: Make the common case fast

Page 26: Chapter 4 Assessing and Understanding Performance

MIPS (million instructions per second)

610*_

_

TimeExecution

CountnInstructioMIPS

280010*5.2

10*)115(

sec5.210*4

10*10

10*1010*)3*12*11*5(

6

9

1

9

9

1

991

MIPS

E

CC

Instruction class

CPI

A 1

B 2

C 3

Code

from

Instruction counts

(in billion)

A B C

Compiler 1 5 1 1

Compiler 2 10 1 1

320010*75.3

10*)1110(

sec75.310*4

10*15

10*1510*)3*12*11*10(

6

9

2

9

9

1

992

MIPS

E

CC

Page 27: Chapter 4 Assessing and Understanding Performance

Always trust execution time metric!

http://www.faculty.uaf.edu/ffdr/EE443/Handouts/Set5_Sp05_3pp.pdf

Page 28: Chapter 4 Assessing and Understanding Performance

A Complete Example (I)

http://www.faculty.uaf.edu/ffdr/EE443/Handouts/Set5_Sp05_3pp.pdf

Page 29: Chapter 4 Assessing and Understanding Performance

A Complete Example (II)

Page 30: Chapter 4 Assessing and Understanding Performance

A Complete Example (III)

Page 31: Chapter 4 Assessing and Understanding Performance

Three problems with using MIPS

MIPS specifies the instruction execution rate but does not take into account the capabilities of the instructions.

– We cannot compare computers with different instruction sets using MIPS, since the instruction counts will certainly differ.

MIPS varies between programs on the same computer;

– a computer cannot have a single MIPS rating for all programs.

MIPS can vary inversely with performance.