computer technology - michigan state universitycse820/lectures/lecturesf12/caqa5e_ch1.pdf · 2012....

15
The University of Adelaide, School of Computer Science 9 September 2012 Chapter 2 — Instructions: Language of the Computer 1 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative Approach, Fifth Edition 2 Copyright © 2012, Elsevier Inc. All rights reserved. Computer Technology Performance improvements: Improvements in semiconductor technology Feature size, clock speed Improvements in computer architectures Enabled by HLL compilers, UNIX Lead to RISC architectures Together have enabled: Lightweight computers Productivity-based managed/interpreted programming languages Introduction

Upload: others

Post on 03-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 1

1 Copyright © 2012, Elsevier Inc. All rights reserved.

Chapter 1

Fundamentals of Quantitative Design and Analysis

Computer Architecture A Quantitative Approach, Fifth Edition

2 Copyright © 2012, Elsevier Inc. All rights reserved.

Computer Technology n  Performance improvements:

n  Improvements in semiconductor technology n  Feature size, clock speed

n  Improvements in computer architectures n  Enabled by HLL compilers, UNIX n  Lead to RISC architectures

n  Together have enabled: n  Lightweight computers n  Productivity-based managed/interpreted

programming languages

Introduction

Page 2: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 2

3 Copyright © 2012, Elsevier Inc. All rights reserved.

Single Processor Performance Introduction

RISC

Move to multi-processor

4 Copyright © 2012, Elsevier Inc. All rights reserved.

Current Trends in Architecture n  Cannot continue to leverage Instruction-Level

parallelism (ILP) n  Single processor performance improvement ended in

2003

n  New models for performance: n  Data-level parallelism (DLP) n  Thread-level parallelism (TLP) n  Request-level parallelism (RLP)

n  These require explicit restructuring of the application

Introduction

Page 3: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 3

5 Copyright © 2012, Elsevier Inc. All rights reserved.

Classes of Computers n  Personal Mobile Device (PMD)

n  e.g. start phones, tablet computers n  Emphasis on energy efficiency and real-time

n  Desktop Computing n  Emphasis on price-performance

n  Servers n  Emphasis on availability, scalability, throughput

n  Clusters / Warehouse Scale Computers n  Used for “Software as a Service (SaaS)” n  Emphasis on availability and price-performance n  Sub-class: Supercomputers, emphasis: floating-point

performance and fast internal networks n  Embedded Computers

n  Emphasis: price

Classes of C

omputers

6 Copyright © 2012, Elsevier Inc. All rights reserved.

Parallelism n  Classes of parallelism in applications:

n  Data-Level Parallelism (DLP) n  Task-Level Parallelism (TLP)

n  Classes of architectural parallelism: n  Instruction-Level Parallelism (ILP) n  Vector architectures/Graphic Processor Units (GPUs) n  Thread-Level Parallelism n  Request-Level Parallelism

Classes of C

omputers

Page 4: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 4

7 Copyright © 2012, Elsevier Inc. All rights reserved.

Flynn’s Taxonomy n  Single instruction stream, single data stream (SISD)

n  Single instruction stream, multiple data streams (SIMD) n  Vector architectures n  Multimedia extensions n  Graphics processor units

n  Multiple instruction streams, single data stream (MISD) n  No commercial implementation

n  Multiple instruction streams, multiple data streams (MIMD) n  Tightly-coupled MIMD n  Loosely-coupled MIMD

Classes of C

omputers

8 Copyright © 2012, Elsevier Inc. All rights reserved.

Defining Computer Architecture n  “Old” view of computer architecture:

n  Instruction Set Architecture (ISA) design n  i.e. decisions regarding:

n  registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding

n  “Real” computer architecture: n  Specific requirements of the target machine n  Design to maximize performance within constraints:

cost, power, and availability n  Includes ISA, microarchitecture, hardware

Defining C

omputer A

rchitecture

Page 5: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 5

9 Copyright © 2012, Elsevier Inc. All rights reserved.

Trends in Technology n  Integrated circuit technology

n  Transistor density: 35%/year n  Die size: 10-20%/year n  Integration overall: 40-55%/year

n  DRAM capacity: 25-40%/year (slowing)

n  Flash capacity: 50-60%/year n  15-20X cheaper/bit than DRAM

n  Magnetic disk technology: 40%/year n  15-25X cheaper/bit then Flash n  300-500X cheaper/bit than DRAM

Trends in Technology

10 Copyright © 2012, Elsevier Inc. All rights reserved.

Bandwidth and Latency n  Bandwidth or throughput

n  Total work done in a given time n  10,000-25,000X improvement for processors n  300-1200X improvement for memory and disks

n  Latency or response time n  Time between start and completion of an event n  30-80X improvement for processors n  6-8X improvement for memory and disks

Trends in Technology

Page 6: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 6

11 Copyright © 2012, Elsevier Inc. All rights reserved.

Bandwidth and Latency

Log-log plot of bandwidth and latency milestones

Trends in Technology

12 Copyright © 2012, Elsevier Inc. All rights reserved.

Transistors and Wires n  Feature size

n  Minimum size of transistor or wire in x or y dimension

n  10 microns in 1971 to .032 microns in 2011 n  Transistor performance scales linearly

n  Wire delay does not improve with feature size! n  Integration density scales quadratically

Trends in Technology

Page 7: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 7

13 Copyright © 2012, Elsevier Inc. All rights reserved.

Power and Energy n  Problem: Get power in, get power out

n  Thermal Design Power (TDP) n  Characterizes sustained power consumption n  Used as target for power supply and cooling system n  Lower than peak power, higher than average power

consumption

n  Clock rate can be reduced dynamically to limit power consumption

n  Energy per task is often a better measurement

Trends in Pow

er and Energy

14 Copyright © 2012, Elsevier Inc. All rights reserved.

Dynamic Energy and Power n  Dynamic energy

n  Transistor switch from 0 -> 1 or 1 -> 0 n  ½ x Capacitive load x Voltage2

n  Dynamic power n  ½ x Capacitive load x Voltage2 x Frequency switched

n  Reducing clock rate reduces power, not energy

Trends in Pow

er and Energy

Page 8: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 8

15 Copyright © 2012, Elsevier Inc. All rights reserved.

Power n  Intel 80386

consumed ~ 2 W n  3.3 GHz Intel

Core i7 consumes 130 W

n  Heat must be dissipated from 1.5 x 1.5 cm chip

n  This is the limit of what can be cooled by air

Trends in Pow

er and Energy

16 Copyright © 2012, Elsevier Inc. All rights reserved.

Reducing Power n  Techniques for reducing power:

n  Do nothing well n  Dynamic Voltage-Frequency Scaling n  Low power state for DRAM, disks n  Overclocking, turning off cores

Trends in Pow

er and Energy

Page 9: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 9

17 Copyright © 2012, Elsevier Inc. All rights reserved.

Static Power n  Static power consumption

n  Currentstatic x Voltage n  Scales with number of transistors n  To reduce: power gating

Trends in Pow

er and Energy

18 Copyright © 2012, Elsevier Inc. All rights reserved.

Trends in Cost n  Cost driven down by learning curve

n  Yield

n  DRAM: price closely tracks cost

n  Microprocessors: price depends on volume n  10% less for each doubling of volume

Trends in Cost

Page 10: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 10

19 Copyright © 2012, Elsevier Inc. All rights reserved.

Integrated Circuit Cost n  Integrated circuit

n  Bose-Einstein formula:

n  Defects per unit area = 0.016-0.057 defects per square cm (2010) n  N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

Trends in Cost

20 Copyright © 2012, Elsevier Inc. All rights reserved.

Dependability n  Module reliability

n  Mean time to failure (MTTF) n  Mean time to repair (MTTR) n  Mean time between failures (MTBF) = MTTF + MTTR n  Availability = MTTF / MTBF

Dependability

Page 11: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 11

21 Copyright © 2012, Elsevier Inc. All rights reserved.

Measuring Performance n  Typical performance metrics:

n  Response time n  Throughput

n  Speedup of X relative to Y n  Execution timeY / Execution timeX

n  Execution time n  Wall clock time: includes all system overheads n  CPU time: only computation time

n  Benchmarks n  Kernels (e.g. matrix multiply) n  Toy programs (e.g. sorting) n  Synthetic benchmarks (e.g. Dhrystone) n  Benchmark suites (e.g. SPEC06fp, TPC-C)

Measuring P

erformance

22

n  Look at SPEC handout Fig. 1.16

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 12: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 12

23 Copyright © 2012, Elsevier Inc. All rights reserved.

Principles of Computer Design n  Take Advantage of Parallelism

n  e.g. multiple processors, disks, memory banks, pipelining, multiple functional units

n  Principle of Locality n  Reuse of data and instructions

(more on next slides)

n  Focus on the Common Case n  Amdahl’s Law

(more on next slides)

Principles

24

The Principle of Locality

n  The Principle of Locality: n  Programs access a relatively small portion of the address space

at any instant of time. n  Two Different Types of Locality:

n  Temporal Locality (Locality in Time): n  If an item is referenced,

it will tend to be referenced again soon (e.g., loops, reuse)

n  Spatial Locality (Locality in Space): n  If an item is referenced,

items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access)

n  Last 30 years, hardware relied on locality for memory performance

P MEM $

Page 13: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 13

25

Levels of the Memory Hierarchy

CPU Registers 100s Bytes 300 – 500 ps (0.3-0.5 ns)

L1 and L2 Cache 10s-100s K Bytes ~1 ns - ~10 ns $1000s/ GByte

Main Memory G Bytes 80ns- 200ns ~ $100/ GByte

Disk 10s T Bytes, 10 ms (10,000,000 ns) ~ $1 / GByte

Capacity Access Time Cost

Tape infinite sec-min ~$1 / GByte

Registers

L1 Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

Staging Xfer Unit

prog./compiler 1-8 bytes

cache cntl 32-64 bytes

OS 4K-8K bytes

user/operator Mbytes

Upper Level

Lower Level

faster

Larger

L2 Cache cache cntl 64-128 bytes Blocks

26

Focus on the Common Case

n  Common sense guides computer design n  Since its engineering, common sense is valuable

n  In making a design trade-off, favor the frequent case over the infrequent case

n  E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it first.

n  E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st

n  Frequent case is often simpler and can be done faster than the infrequent case

n  E.g., overflow is rare when adding two numbers, so improve performance by optimizing more common case of no overflow

n  May slow down overflow, but overall performance improved by optimizing for the normal case

n  What is frequent case and how much performance improved by making case faster => Amdahl’s Law

Page 14: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 14

27

Amdahl’s Law

( )enhanced

enhancedenhanced

new

oldoverall

SpeedupFraction Fraction

1 ExTimeExTime Speedup

+−==1

Best you could ever hope to do:

( )enhancedmaximum Fraction - 1

1 Speedup =

( ) !"

#$%

&+−×=

enhanced

enhancedenhancedoldnew Speedup

FractionFraction ExTime ExTime 1

28 Copyright © 2012, Elsevier Inc. All rights reserved.

Principles of Computer Design n  The Processor Performance Equation

Principles

Page 15: Computer Technology - Michigan State Universitycse820/lectures/lecturesF12/CAQA5e_ch1.pdf · 2012. 9. 9. · The University of Adelaide, School of Computer Science 9 September 2012

The University of Adelaide, School of Computer Science 9 September 2012

Chapter 2 — Instructions: Language of the Computer 15

29 Copyright © 2012, Elsevier Inc. All rights reserved.

Principles of Computer Design P

rinciples n  Different instruction types having different

CPIs