computer technology - michigan state universitycse820/lectures/lecturesf12/caqa5e_ch1.pdf · 2012....
TRANSCRIPT
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 1
1 Copyright © 2012, Elsevier Inc. All rights reserved.
Chapter 1
Fundamentals of Quantitative Design and Analysis
Computer Architecture A Quantitative Approach, Fifth Edition
2 Copyright © 2012, Elsevier Inc. All rights reserved.
Computer Technology n Performance improvements:
n Improvements in semiconductor technology n Feature size, clock speed
n Improvements in computer architectures n Enabled by HLL compilers, UNIX n Lead to RISC architectures
n Together have enabled: n Lightweight computers n Productivity-based managed/interpreted
programming languages
Introduction
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 2
3 Copyright © 2012, Elsevier Inc. All rights reserved.
Single Processor Performance Introduction
RISC
Move to multi-processor
4 Copyright © 2012, Elsevier Inc. All rights reserved.
Current Trends in Architecture n Cannot continue to leverage Instruction-Level
parallelism (ILP) n Single processor performance improvement ended in
2003
n New models for performance: n Data-level parallelism (DLP) n Thread-level parallelism (TLP) n Request-level parallelism (RLP)
n These require explicit restructuring of the application
Introduction
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 3
5 Copyright © 2012, Elsevier Inc. All rights reserved.
Classes of Computers n Personal Mobile Device (PMD)
n e.g. start phones, tablet computers n Emphasis on energy efficiency and real-time
n Desktop Computing n Emphasis on price-performance
n Servers n Emphasis on availability, scalability, throughput
n Clusters / Warehouse Scale Computers n Used for “Software as a Service (SaaS)” n Emphasis on availability and price-performance n Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks n Embedded Computers
n Emphasis: price
Classes of C
omputers
6 Copyright © 2012, Elsevier Inc. All rights reserved.
Parallelism n Classes of parallelism in applications:
n Data-Level Parallelism (DLP) n Task-Level Parallelism (TLP)
n Classes of architectural parallelism: n Instruction-Level Parallelism (ILP) n Vector architectures/Graphic Processor Units (GPUs) n Thread-Level Parallelism n Request-Level Parallelism
Classes of C
omputers
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 4
7 Copyright © 2012, Elsevier Inc. All rights reserved.
Flynn’s Taxonomy n Single instruction stream, single data stream (SISD)
n Single instruction stream, multiple data streams (SIMD) n Vector architectures n Multimedia extensions n Graphics processor units
n Multiple instruction streams, single data stream (MISD) n No commercial implementation
n Multiple instruction streams, multiple data streams (MIMD) n Tightly-coupled MIMD n Loosely-coupled MIMD
Classes of C
omputers
8 Copyright © 2012, Elsevier Inc. All rights reserved.
Defining Computer Architecture n “Old” view of computer architecture:
n Instruction Set Architecture (ISA) design n i.e. decisions regarding:
n registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding
n “Real” computer architecture: n Specific requirements of the target machine n Design to maximize performance within constraints:
cost, power, and availability n Includes ISA, microarchitecture, hardware
Defining C
omputer A
rchitecture
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 5
9 Copyright © 2012, Elsevier Inc. All rights reserved.
Trends in Technology n Integrated circuit technology
n Transistor density: 35%/year n Die size: 10-20%/year n Integration overall: 40-55%/year
n DRAM capacity: 25-40%/year (slowing)
n Flash capacity: 50-60%/year n 15-20X cheaper/bit than DRAM
n Magnetic disk technology: 40%/year n 15-25X cheaper/bit then Flash n 300-500X cheaper/bit than DRAM
Trends in Technology
10 Copyright © 2012, Elsevier Inc. All rights reserved.
Bandwidth and Latency n Bandwidth or throughput
n Total work done in a given time n 10,000-25,000X improvement for processors n 300-1200X improvement for memory and disks
n Latency or response time n Time between start and completion of an event n 30-80X improvement for processors n 6-8X improvement for memory and disks
Trends in Technology
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 6
11 Copyright © 2012, Elsevier Inc. All rights reserved.
Bandwidth and Latency
Log-log plot of bandwidth and latency milestones
Trends in Technology
12 Copyright © 2012, Elsevier Inc. All rights reserved.
Transistors and Wires n Feature size
n Minimum size of transistor or wire in x or y dimension
n 10 microns in 1971 to .032 microns in 2011 n Transistor performance scales linearly
n Wire delay does not improve with feature size! n Integration density scales quadratically
Trends in Technology
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 7
13 Copyright © 2012, Elsevier Inc. All rights reserved.
Power and Energy n Problem: Get power in, get power out
n Thermal Design Power (TDP) n Characterizes sustained power consumption n Used as target for power supply and cooling system n Lower than peak power, higher than average power
consumption
n Clock rate can be reduced dynamically to limit power consumption
n Energy per task is often a better measurement
Trends in Pow
er and Energy
14 Copyright © 2012, Elsevier Inc. All rights reserved.
Dynamic Energy and Power n Dynamic energy
n Transistor switch from 0 -> 1 or 1 -> 0 n ½ x Capacitive load x Voltage2
n Dynamic power n ½ x Capacitive load x Voltage2 x Frequency switched
n Reducing clock rate reduces power, not energy
Trends in Pow
er and Energy
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 8
15 Copyright © 2012, Elsevier Inc. All rights reserved.
Power n Intel 80386
consumed ~ 2 W n 3.3 GHz Intel
Core i7 consumes 130 W
n Heat must be dissipated from 1.5 x 1.5 cm chip
n This is the limit of what can be cooled by air
Trends in Pow
er and Energy
16 Copyright © 2012, Elsevier Inc. All rights reserved.
Reducing Power n Techniques for reducing power:
n Do nothing well n Dynamic Voltage-Frequency Scaling n Low power state for DRAM, disks n Overclocking, turning off cores
Trends in Pow
er and Energy
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 9
17 Copyright © 2012, Elsevier Inc. All rights reserved.
Static Power n Static power consumption
n Currentstatic x Voltage n Scales with number of transistors n To reduce: power gating
Trends in Pow
er and Energy
18 Copyright © 2012, Elsevier Inc. All rights reserved.
Trends in Cost n Cost driven down by learning curve
n Yield
n DRAM: price closely tracks cost
n Microprocessors: price depends on volume n 10% less for each doubling of volume
Trends in Cost
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 10
19 Copyright © 2012, Elsevier Inc. All rights reserved.
Integrated Circuit Cost n Integrated circuit
n Bose-Einstein formula:
n Defects per unit area = 0.016-0.057 defects per square cm (2010) n N = process-complexity factor = 11.5-15.5 (40 nm, 2010)
Trends in Cost
20 Copyright © 2012, Elsevier Inc. All rights reserved.
Dependability n Module reliability
n Mean time to failure (MTTF) n Mean time to repair (MTTR) n Mean time between failures (MTBF) = MTTF + MTTR n Availability = MTTF / MTBF
Dependability
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 11
21 Copyright © 2012, Elsevier Inc. All rights reserved.
Measuring Performance n Typical performance metrics:
n Response time n Throughput
n Speedup of X relative to Y n Execution timeY / Execution timeX
n Execution time n Wall clock time: includes all system overheads n CPU time: only computation time
n Benchmarks n Kernels (e.g. matrix multiply) n Toy programs (e.g. sorting) n Synthetic benchmarks (e.g. Dhrystone) n Benchmark suites (e.g. SPEC06fp, TPC-C)
Measuring P
erformance
22
n Look at SPEC handout Fig. 1.16
Copyright © 2012, Elsevier Inc. All rights reserved.
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 12
23 Copyright © 2012, Elsevier Inc. All rights reserved.
Principles of Computer Design n Take Advantage of Parallelism
n e.g. multiple processors, disks, memory banks, pipelining, multiple functional units
n Principle of Locality n Reuse of data and instructions
(more on next slides)
n Focus on the Common Case n Amdahl’s Law
(more on next slides)
Principles
24
The Principle of Locality
n The Principle of Locality: n Programs access a relatively small portion of the address space
at any instant of time. n Two Different Types of Locality:
n Temporal Locality (Locality in Time): n If an item is referenced,
it will tend to be referenced again soon (e.g., loops, reuse)
n Spatial Locality (Locality in Space): n If an item is referenced,
items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access)
n Last 30 years, hardware relied on locality for memory performance
P MEM $
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 13
25
Levels of the Memory Hierarchy
CPU Registers 100s Bytes 300 – 500 ps (0.3-0.5 ns)
L1 and L2 Cache 10s-100s K Bytes ~1 ns - ~10 ns $1000s/ GByte
Main Memory G Bytes 80ns- 200ns ~ $100/ GByte
Disk 10s T Bytes, 10 ms (10,000,000 ns) ~ $1 / GByte
Capacity Access Time Cost
Tape infinite sec-min ~$1 / GByte
Registers
L1 Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
Staging Xfer Unit
prog./compiler 1-8 bytes
cache cntl 32-64 bytes
OS 4K-8K bytes
user/operator Mbytes
Upper Level
Lower Level
faster
Larger
L2 Cache cache cntl 64-128 bytes Blocks
26
Focus on the Common Case
n Common sense guides computer design n Since its engineering, common sense is valuable
n In making a design trade-off, favor the frequent case over the infrequent case
n E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it first.
n E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st
n Frequent case is often simpler and can be done faster than the infrequent case
n E.g., overflow is rare when adding two numbers, so improve performance by optimizing more common case of no overflow
n May slow down overflow, but overall performance improved by optimizing for the normal case
n What is frequent case and how much performance improved by making case faster => Amdahl’s Law
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 14
27
Amdahl’s Law
( )enhanced
enhancedenhanced
new
oldoverall
SpeedupFraction Fraction
1 ExTimeExTime Speedup
+−==1
Best you could ever hope to do:
( )enhancedmaximum Fraction - 1
1 Speedup =
( ) !"
#$%
&+−×=
enhanced
enhancedenhancedoldnew Speedup
FractionFraction ExTime ExTime 1
28 Copyright © 2012, Elsevier Inc. All rights reserved.
Principles of Computer Design n The Processor Performance Equation
Principles
The University of Adelaide, School of Computer Science 9 September 2012
Chapter 2 — Instructions: Language of the Computer 15
29 Copyright © 2012, Elsevier Inc. All rights reserved.
Principles of Computer Design P
rinciples n Different instruction types having different
CPIs