performance, power, die yield - facultystaff.richmond.edudszajda/classes/cs301/fall_2017/... · #...

37
Performance, Power, Die Yield CS301 Prof Szajda

Upload: trinhnhan

Post on 19-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Performance, Power, Die Yield

CS301Prof Szajda

Administrative

• HW #1 assignedw Due Wednesday, 9/3 at 5:00 pm

Performance Metrics

(How do we compare two machines?)

What to Measure?

4

Which airplane has the best performance?

Performance

• One size does not fit all• Depends on application domain

w Scientific computingw Graphicsw Databasesw General-Purpose desktopw Beware of designing to benchmark!

• Depends on technology characteristicsw DRAM speed and capacity, chip size, etc.

Which Metric Do We Use?

• Response or execution timew Difference between start and end timew Individual user cares most about this

• Throughputw Total amount of work done in given timew Frequently used for servers and clusters

• How are these affected byw Replacing processor with faster version?w Adding more processors?

Execution Time

• Shorter execution time is better

• Allows comparison between 2 machines

Relative Performance

• “X is n times faster than Y”

• Example: w Machine A takes 10s to run programw Machine B takes 15s to run same programw What is the performance ratio?

Different Time Values

• Execution time w Wall-clock, response, or elapsed time

§ Includes everything (processing,I/O, OS overhead, etc)!w Determines system performance

• CPU timew Time spent executing code for this task only

§ Does not include I/O or time-sharingw Comprises user CPU time and system CPU time

§ Difference programs are affected differently by CPU and system performance

w man time§ 90.7u 12.9s 2:39 65%

§ User: 90.7 sec§ System: 12.9 sec§ Elapsed time: 2 min 39 sec

Clock Cycles

• Instead of expressing time in seconds, use clock cycles

• Clock w Determines when events take placew Runs at constant rate (ex. 1 GHz)w Easy to convert between clock rate and seconds

§ Clock rate = 1 / Clock Cycle§ 500 MHz = 1 / (2 ns)§ 1 ns = 10-9 s

Chapter 1 — Computer Abstractions and Technology —

CPU Clockingn Operation of digital hardware governed by a

constant-rate clock

Clock (cycles)

Data transferand computation

Update state

Clock period

n Clock period: duration of a clock cycle

n e.g., 250ps = 0.25ns = 250×10–12s

n Clock frequency (rate): cycles per second

n e.g., 4.0GHz = 4000MHz = 4.0×109Hz

Chapter 1 — Computer Abstractions and Technology —

CPU Time

n Performance improved by n Reducing number of clock cycles n Increasing clock rate n Hardware designer must often trade off clock

rate against cycle count

Chapter 1 — Computer Abstractions and Technology —

CPU Time Examplen Computer A: 2GHz clock, 10s CPU time n Designing Computer B

n Aim for 6s CPU time n Can do faster clock, but causes 1.2 × clock cycles

n How fast must Computer B clock be?

Chapter 1 — Computer Abstractions and Technology —

Instruction Count and CPI

n Instruction Count for a program n Determined by program, ISA and compiler

n Average cycles per instruction n Determined by CPU hardware n If different instructions have different CPI

n Average CPI affected by instruction mix

Chapter 1 — Computer Abstractions and Technology —

CPI Examplen Computer A: Cycle Time = 250ps, CPI = 2.0 n Computer B: Cycle Time = 500ps, CPI = 1.2 n Same ISA n Which is faster, and by how much?

A is faster…

…by this much

Application Characteristics

• Determine the mix of different instruction typesw Integer arithmeticw Logical operationsw Floating point arithmeticw Loads and stores

• Different applications have different CPI because of different instruction mixes

Chapter 1 — Computer Abstractions and Technology —

CPI in More Detailn If different instruction classes take different

numbers of cycles

n Weighted average CPI

Relative frequency

Chapter 1 — Computer Abstractions and Technology —

CPI Examplen Alternative compiled code sequences using

instructions in classes A, B, C

Class A B CCPI for class 1 2 3IC in sequence 1 2 1 2IC in sequence 2 4 1 1

n Sequence 1: IC = 5

n Clock Cycles= 2×1 + 1×2 + 2×3= 10

n Avg. CPI = 10/5 = 2.0

n Sequence 2: IC = 6

n Clock Cycles= 4×1 + 1×2 + 1×3= 9

n Avg. CPI = 9/6 = 1.5

Chapter 1 — Computer Abstractions and Technology —

Performance Summary

n Performance depends on n Algorithm: affects IC, possibly CPI n Programming language: affects IC, CPI n Compiler: affects IC, CPI n Instruction set architecture: affects IC, CPI, Tc

The BIG Picture

Amdahl’s Law

• How much speedup do you get from an enhancement?

• Based onw Fraction of time enhancement usedw Improvement in enhanced mode

Speedup = Execution time w/o enhancement

Execution time w/ enhancement

Execnew = Execold × ((1-fractionenh) + Speedupenh

fractionenh )

Chapter 1 — Computer Abstractions and Technology —

Pitfall: Amdahl’s Lawn Improving an aspect of a computer and

expecting a proportional improvement in overall performance

§1.10 Fallacies and Pitfalls

n Can’t be done!

n Example: multiply accounts for 80s/100s

n How much improvement in multiply performance to get 5× overall?

n Corollary: make the common case fast

Review Question

• Your machine has a clock rate of 2.4GHz. How long is the clock cycle?

Review Questions

• Suppose you are given the following:w Machine A

§ 1 GHz§ Average CPI = 1.6§ Instructions = 1.7 Billion

w Machine B§ 3.3 GHz§ Average CPI = 6.1§ Instructions = 2 Billion

• Which machine is faster? By how much?

Review Questions

• What is the average CPI for a machine with the following CPIs on an application with the following instruction frequency?

Type Frequency CPI

Arithme(c 0.45 1

Memory 0.3 8

Control 0.2 3

Mult/Div 0.05 5

Review Questions

• What factors must be included when comparing the relative performance of two machines?

Amdahl’s Law

• Suppose you have an enhancement that makes function 10x faster.

• Speedup if used 5% of the time?• Speedup if used 40% of the time?

Execnew = Execold × ((1-fractionenh) + Speedupenh

fractionenh )

Review Questions

• What is the equation for execution time?

• What does Amdahl’s Law say?

Benchmarks

• Programs specifically used to measure performance

• Hope is that it is representative of how computer will be used

• Examplesw SPEC Integer and Floating Pointw MediaBenchw MineBenchw TPC

Chapter 1 — Computer Abstractions and Technology —

SPEC CPU Benchmarkn Programs used to measure performance

n Supposedly typical of actual workload n Standard Performance Evaluation Corp (SPEC)

n Develops benchmarks for CPU, I/O, Web, … n SPEC CPU2006

n Elapsed time to execute a selection of programs n Negligible I/O, so focuses on CPU performance

n Normalize relative to reference machine n Summarize as geometric mean of performance ratios

n CINT2006 (integer) and CFP2006 (floating-point)

Chapter 1 — Computer Abstractions and Technology —

CINT2006 for Intel Core i7 920

Chapter 1 — Computer Abstractions and Technology —

Recent Concern: Power Trends

n In CMOS IC technology

§1.7 The Pow

er Wall

×1000×30 5V → 1V

Tricks to Increase Power

• Attach large cooling devices• Turn off parts of chips not used in

given clock cyclew Can increase power to 300 watts...w ...But these and other ways all

prohibitively expensive for desktop computers. So...

32

More Recent Approaches:Chip Multiprocessors

• Reasons for changew Limited opportunities to improve single

thread performancew Powerw On-chip communication latencies

Tapering Processor Performance

Chapter 1 — Computer Abstractions and Technology —

Uniprocessor Performance§1.8 The S

ea Change: The S

witch to M

ultiprocessors

Constrained by power, instruction-level parallelism, memory latency

Chapter 1 — Computer Abstractions and Technology —

Multiprocessorsn Multicore microprocessors

n More than one processor per chip n Requires explicitly parallel programming

n Compare with instruction level parallelism n Hardware executes multiple instructions at once n Hidden from the programmer

n Hard to do n Programming for performance n Load balancing n Optimizing communication and synchronization

Chapter 1 — Computer Abstractions and Technology —

Concluding Remarksn Cost/performance is improving

n Due to underlying technology development n Hierarchical layers of abstraction

n In both hardware and software n Instruction set architecture

n The hardware/software interface n Execution time: the best performance

measure n Power is a limiting factor

n Use parallelism to improve performance

§1.9 Concluding R

emarks