rendimiento - utmfsantiag/arqcomputadoras/02_rendimiento.pdfunidad 2 - rendimiento (arquitectura de...

27
RENDIMIENTO M. C. Felipe Santiago Espinosa Marzo/2018 Maestría en Electrónica Arquitectura de Computadoras Unidad 2

Upload: phamdung

Post on 10-May-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

RENDIMIENTO

M. C. Felipe Santiago Espinosa

Marzo/2018

Maestría en ElectrónicaArquitectura de Computadoras

Unidad 2

Page 2: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Defining Performancen Which airplane has the best performance?

Unidad 2 - Rendimiento (Arquitectura de Computadoras) 2

Page 3: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Response Time and Throughput

n Response timen How long it takes to do a task

n Throughputn Total work done per unit time

• e.g., tasks/transactions/… per hour

n How are response time and throughput affected byn Replacing the processor with a faster version?n Adding more processors?

n We’ll focus on response time for now…

3Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 4: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Relative Performancen Define Performance = 1/Execution Timen “X is n time faster than Y”

n XY

YX

time Executiontime ExecutionePerformancePerformanc

n Example: time taken to run a programn 10s on A, 15s on Bn Execution TimeB / Execution TimeA

= 15s / 10s = 1.5n So A is 1.5 times faster than B

4Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 5: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Measuring Execution Time

n Elapsed timen Total response time, including all aspects

• Processing, I/O, OS overhead, idle time

n Determines system performance

n CPU timen Time spent processing a given job

• Discounts I/O time, other jobs’ shares

n Comprises user CPU time and system CPU timen Different programs are affected differently by

CPU and system performance

5Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 6: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

CPU Clockingn Operation of digital hardware governed by a

constant-rate clock

Clock (cycles)

Data transferand computation

Update state

Clock period

n Clock period: duration of a clock cyclen e.g., 250ps = 0.25ns = 250×10–12s

n Clock frequency (rate): cycles per secondn e.g., 4.0GHz = 4000MHz = 4.0×109Hz

6Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 7: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

CPU Time

n Performance improved byn Reducing number of clock cyclesn Increasing clock rate (frequency)n Hardware designer must often trade off clock rate against

cycle count

Rate ClockCycles Clock CPU

Time Cycle ClockCycles Clock CPUTime CPU

7Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 8: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

CPU Time Examplen Computer A: 2GHz clock and for some program: 10s

CPU time

n Designing Computer Bn Aim for 6s CPU time (the same program)n Can do faster clock, but causes 1.2 × clock cycles

n How fast must Computer B clock be? (The frequency of the Computer B)

8Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 9: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Instruction Count and CPI

n Instruction Count for a programn Determined by program, ISA and compiler

n Average cycles per instructionn Determined by CPU hardwaren If different instructions have different CPI

• Average CPI affected by instruction mix

Rate ClockCPICount nInstructio

Time Cycle ClockCPICount nInstructioTime CPU

nInstructio per CyclesCount nInstructioCycles Clock

9Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 10: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

CPI Example

n Computer A: Cycle Time = 250ps, CPI = 2.0n Computer B: Cycle Time = 500ps, CPI = 1.2n Same ISAn Which is faster, and by how much?

10Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 11: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

CPI in More Detailn If different instruction classes take different numbers

of cycles

n

1iii )Count nInstructio(CPICycles Clock

n Weighted average CPI

n

1i

ii Count nInstructio

Count nInstructioCPICount nInstructio

Cycles ClockCPI

Relative frequency

11Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 12: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

CPI Examplen Alternative compiled code sequences using

instructions in classes A, B, CClass A B CCPI for class 1 2 3IC in sequence 1 2 1 2IC in sequence 2 4 1 1

n Sequence 1: IC = 5n Clock Cycles = 2×1 + 1×2 + 2×3 = 10n Avg. CPI = 10/5 = 2.0

n ¿ Sequence 2 ?

12Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 13: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Performance Summary

n Performance depends onn Algorithm: affects IC, possibly CPIn Programming language: affects IC, CPIn Compiler: affects IC, CPIn Instruction set architecture: affects IC, CPI, Tc

The BIG Picture

cycle ClockSeconds

nInstructiocycles Clock

ProgramnsInstructioTime CPU

13Unidad 2 - Rendimiento (Arquitectura de Computadoras)

Page 14: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Power Trends

n In CMOS IC technology

FrequencyVoltageload CapacitivePower 2

×1000×30 5V → 1V

14Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 15: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Reducing Power• Suppose a new CPU has

– 85% of capacitive load of old CPU– 15% voltage and 15% frequency reduction

0.520.85FVC

0.85F0.85)(V0.85CPP 4

old2

oldold

old2

oldold

old

new

n The power walln We can’t reduce voltage furthern We can’t remove more heat

n How else can we improve performance?15Unidad 2 - Rendimiento

(Arquitectura de Computadoras)

Page 16: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Uniprocessor Performance

Constrained by power, instruction-level parallelism, memory latency

16Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 17: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Multiprocessors

n Multicore microprocessorsn More than one processor per chip

n Requires explicitly parallel programmingn Compare with instruction level parallelism

n Hardware executes multiple instructions at oncen Hidden from the programmer

n Hard to don Programming for performancen Load balancingn Optimizing communication and synchronization

17Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 18: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

SPEC CPU Benchmarkn Programs used to measure performance

n Supposedly typical of actual workloadn Standard Performance Evaluation Corp (SPEC)

n Develops benchmarks for CPU, I/O, Web, …n SPEC CPU2006

n Elapsed time to execute a selection of programsn Negligible I/O, so focuses on CPU performance

n Normalize relative to reference machinen Summarize as geometric mean of performance ratios

n CINT2006 (integer) and CFP2006 (floating-point)

nn

1iiratio time Execution

18Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 19: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

CINT2006 for Intel Core i7 920

19Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 20: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

SPEC Power Benchmarkn Power consumption of server at different

workload levelsn Performance: ssj_ops/secn Power: Watts (Joules/sec)

10

0ii

10

0ii powerssj_ops Wattper ssj_ops Overall

20Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 21: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

SPECpower_ssj2008 for Xeon X5650

21Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 22: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Pitfall: Amdahl’s Lawn Improving an aspect of a computer and expecting a

proportional improvement in overall performance

208020 n

n Can’t be done!

unaffectedaffected

improved Tfactor timprovemen

TT

n Example: multiply accounts for 80s/100sn How much improvement in multiply performance to get

5× overall?

n Corollary: make the common case fast22Unidad 2 - Rendimiento

(Arquitectura de Computadoras)

Page 23: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Fallacy: Low Power at Idle

n Look back at i7 power benchmarkn At 100% load: 258Wn At 50% load: 170W (66%)n At 10% load: 121W (47%)

n Google data centern Mostly operates at 10% – 50% loadn At 100% load less than 1% of the time

n Consider designing processors to make power proportional to load

23Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 24: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Pitfall: MIPS as a Performance Metric

n MIPS: Millions of Instructions Per Secondn Doesn’t account for

• Differences in ISAs between computers• Differences in complexity between instructions

66

6

10CPIrate Clock

10rate Clock

CPIcount nInstructiocount nInstructio10time Execution

count nInstructioMIPS

n CPI varies between programs on a given CPU

24Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Page 25: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

n Para un programa X, un compilador generó la siguiente distribución de Instrucciones:

n Con un compilador optimizado se descarta el 50% de las instrucciones de la ALU (sin reducir cargas, almacenamientos y saltos).

n Con un ciclo de reloj de 2-ns (frecuencia de reloj 500-MHz) ¿Cuál es la frecuencia MIPS del código optimizado y del código sin optimizar? ¿Son acordes con los del tiempo de ejecución?

Unidad 2 - Rendimiento(Arquitectura de Computadoras) 25

Tipo de instrucción Frecuencia de una instrucción

CPI

Operaciones de ALU 43 % 1Cargas 21 % 2

Almacenamientos 12 % 2Saltos 24 % 2

Error en la aplicación de MIPS

Page 26: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

MFLOPS o megaFlopsn Alternativa popular para comparar sistemas diferentes.

n Basado en operaciones en lugar de instrucciones.n La métrica no es aplicable fuera del rango de operaciones

en punto flotante.n Para un compilador: MFLOPS = 0.n Un programa con un 100 % de sumas en punto flotante

tiene una frecuencia en MFLOPS mucho mayor que un programa con un 100 % de divisiones.

Unidad 2 - Rendimiento(Arquitectura de Computadoras) 26

610time ExecutionOperations FPMFLOPS

Page 27: RENDIMIENTO - UTMfsantiag/ArqComputadoras/02_Rendimiento.pdfUnidad 2 - Rendimiento (Arquitectura de Computadoras) CPU Clocking n Operation of digital hardware governed by a constant-rate

Concluding Remarks

n Execution time: the best performance measuren Power is a limiting factor

– Use parallelism to improve performance

27Unidad 2 - Rendimiento(Arquitectura de Computadoras)

Tarea: Problemas de rendimiento ubicados en la página web del curso.

Entrega: