computer architecture, background and motivationslide 1 part i background and motivation
TRANSCRIPT
Computer Architecture, Background and Motivation Slide 1
Part IBackground and
Motivation
Computer Architecture, Background and Motivation Slide 2
I Background and Motivation
Topics in This Part
Chapter 1 Combinational Digital Circuits
Chapter 2 Digital Circuits with Memory
Chapter 3 Computer System Technology
Chapter 4 Computer Performance
Computer Architecture, Background and Motivation Slide 3
1 Combinational Digital Circuits
Topics in This Chapter
1.1 Signals, Logic Operators, and Gates
1.2 Boolean Functions and Expressions
1.3 Designing Gate Networks
1.4 Useful Combinational Parts
1.5 Programmable Combinational Parts
1.6 Timing and Circuit Considerations
Computer Architecture, Background and Motivation Slide 4
1.1 Signals, Logic Operators, and Gates
Figure 1.1 Some basic elements of digital logic circuits, with operator signs used in this book highlighted.
x y
AND Name XOR OR NOT
Graphical symbol
x y
Operator sign and alternate(s)
x y x y xy
x y
x x or x
_
x y or xy Arithmetic expression
x y 2xy x y xy 1 x
Output is 1 iff: Input is 0
Both inputs are 1s
At least one input is 1
Inputs are not equal
Computer Architecture, Background and Motivation Slide 5
Variations in Gate Symbols
Figure 1.2 Gates with more than two inputs and/or with inverted signals at input or output.
OR NOR NAND AND XNOR
Computer Architecture, Background and Motivation Slide 6
Gates as Control Elements
Figure 1.3 An AND gate and a tristate buffer act as controlled switches or valves. An inverting buffer is logically the same as a NOT gate.
Enable/Pass signal e
Data in x
Data out x or 0
Data in x
Enable/Pass signal e
Data out x or “high impedance”
(a) AND gate for controlled transfer (b) Tristate buffer
(c) Model for AND switch.
x
e
No data or x
0
1 x
e
ex
0
1 0
(d) Model for tristate buffer.
Computer Architecture, Background and Motivation Slide 7
Wired OR and Bus Connections
Figure 1.4 Wired OR allows tying together of several controlled signals.
e
e
e Data out (x, y, z, or high
impedance)
(b) Wired OR of t ristate outputs
e
e
e
Data out (x, y, z, or 0)
(a) Wired OR of product terms
z
x
y
z
x
y
z
x
y
z
x
y
Computer Architecture, Background and Motivation Slide 8
Control/Data Signals and Signal Bundles
Figure 1.5 Arrays of logic gates represented by a single gate symbol.
/ 8
/
8 / 8
Compl
/ 32
/ k
/ 32
Enable
/ k
/ k
/ k
(b) 32 AND gates (c) k XOR gates (a) 8 NOR gates
Computer Architecture, Background and Motivation Slide 9
1.2 Boolean Functions and Expressions
Ways of specifying a logic function
Truth table: 2n row, “don’t-care” in input or output
Logic expression: w (x y z), product-of-sums, sum-of-products, equivalent expressions
Word statement: Alarm will sound if the door is opened while the security system is engaged, or when the smoke detector is triggered
Logic circuit diagram: Synthesis vs analysis
Computer Architecture, Background and Motivation Slide 10
Table 1.2 Laws (basic identities) of Boolean algebra.
Name of law OR version AND versionIdentity x 0 = x x 1 = x
One/Zero x 1 = 1 x 0 = 0
Idempotent x x = x x x = x
Inverse x x = 1 x x = 0
Commutative x y = y x x y = y x
Associative (x y) z = x (y z) (x y) z = x (y z)
Distributive x (y z) = (x y) (x z) x (y z) = (x y) (x z)
DeMorgan’s (x y) = x y (x y) = x y
Manipulating Logic Expressions
Computer Architecture, Background and Motivation Slide 11
Proving the Equivalence of Logic Expressions
Example 1.1
Truth-table method: Exhaustive verification
Arithmetic substitution x y = x + y xy x y = x + y 2xy
Case analysis: two cases, x = 0 or x = 1
Logic expression manipulation
Example: x y ? x y x y x + y – 2xy ? (1 – x)y + x(1 – y) – (1 – x)yx(1 – y)
Computer Architecture, Background and Motivation Slide 12
1.3 Designing Gate Networks
AND-OR, NAND-NAND, OR-AND, NOR-NOR
Logic optimization: cost, speed, power dissipation
(a) AND-OR circuit
z
x y
x
y z
(b) Intermediate circuit
(c) NAND-NAND equivalent
z
x y
x
y z z
x y
x
y z
Figure 1.6 A two-level AND-OR circuit and two equivalent circuits.
(x y) = x y
Computer Architecture, Background and Motivation Slide 13
BCD-to-Seven-Segment Decoder
Example 1.2
Figure 1.8 The logic circuit that generates the enable signal for the lowermost segment (number 3) in a seven-segment display unit.
x 3 x 2 x 1 x 0
Signals to enable or turn on the segments
4-bit input in [0, 9] e 0
e 5
e 6
e 4
e 2
e 1
e 3
1
2 4
5
0
3
6
Computer Architecture, Background and Motivation Slide 14
1.4 Useful Combinational Parts
High-level building blocks
Much like prefab parts used in building a house
Arithmetic components will be covered in Part III (adders, multipliers, ALUs)
Here we cover three useful parts: multiplexers, decoders/demultiplexers, encoders
Computer Architecture, Background and Motivation Slide 15
Multiplexers
Figure 1.9 Multiplexer (mux), or selector, allows one of several inputs to be selected and routed to output depending on the binary value of a
set of selection or address signals provided to it.
x
x
y
z
1
0
x
x
z
y
x x
y
z
1
0
y
/ 32
/ 32
/ 32 1
0
1
0
3
2
z
y 1 0
1
0
1
0
y 1
y 0
y 0
(a) 2-to-1 mux (b) Switch view (c) Mux symbol
(d) Mux array (e) 4-to-1 mux with enable (e) 4-to-1 mux design
0
1
y
1 1
1
0
0 0
x x x x
1 0
2
3
x
x
x
x
0
1
2
3
z
e (Enable)
Computer Architecture, Background and Motivation Slide 16
Decoders/Demultiplexers
Figure 1.10 A decoder allows the selection of one of 2a options using an a-bit address as input. A demultiplexer (demux) is a decoder that
only selects an output if its enable signal is asserted.
y 1 y 0
x 0
x 3
x 2
x 1
1
0
3
2
y 1 y 0
x 0
x 3
x 2
x 1 e
1
0
3
2
y 1 y 0
x 0
x 3
x 2
x 1
(a) 2-to-4 decoder (b) Decoder symbol (c) Demultiplexer, or decoder with “enable”
(Enable)
Computer Architecture, Background and Motivation Slide 17
Encoders
Figure 1.11 A 2a-to-a encoder outputs an a-bit binary number
equal to the index of the single 1 among its 2a inputs.
(a) 4-to-2 encoder (b) Encoder symbol
x 0
x 3
x 2
x 1
y 1 y 0
1
0
3
2
x 0
x 3
x 2
x 1
y 1 y 0
Computer Architecture, Background and Motivation Slide 18
1.5 Programmable Combinational Parts
Programmable ROM (PROM)
Programmable array logic (PAL)
Programmable logic array (PLA)
A programmable combinational part can do the job of many gates or gate networks
Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)
Computer Architecture, Background and Motivation Slide 19
PROMs
Figure 1.12 Programmable connections and their use in a PROM.
. . .
.
.
.
Inputs
Outputs
(a) Programmable OR gates
w
x
y
z
(b) Logic equivalent of part a
w
x
y
z
(c) Programmable read-only memory (PROM)
De
cod
er
Computer Architecture, Background and Motivation Slide 20
PALs and PLAs
Figure 1.13 Programmable combinational logic: general structure and two classes known as PAL and PLA devices. Not shown is PROM with
fixed AND array (a decoder) and programmable OR array.
AND array (AND plane)
OR array (OR
plane)
. . .
. . .
.
.
.
Inputs
Outputs
(a) General programmable combinational logic
(b) PAL: programmable AND array, fixed OR array
8-input ANDs
(c) PLA: programmable AND and OR arrays
6-input ANDs
4-input ORs
Computer Architecture, Background and Motivation Slide 21
1.6 Timing and Circuit Considerations
Gate delay : a fraction of, to a few, nanoseconds
Wire delay, previously negligible, is now important (electronic signals travel about 15 cm per ns)
Circuit simulation to verify function and timing
Changes in gate/circuit output, triggered by changes in its inputs, are not instantaneous
Computer Architecture, Background and Motivation Slide 22
Glitching
Figure 1.14 Timing diagram for a circuit that exhibits glitching.
x = 0
y
z
a = x y
f = a z 2 2
Using the PAL in Fig. 1.13b to implement f = x y z
Computer Architecture, Background and Motivation Slide 23
CMOS Transmission Gates
Figure 1.15 A CMOS transmission gate and its use in building
a 2-to-1 mux.
z
x
x
0
1
(a) CMOS transmission gate: circuit and symbol
(b) Two-input mux built of two transmission gates
TG
TG TG
y
P
N
Computer Architecture, Background and Motivation Slide 24
2 Digital Circuits with Memory
Second of two chapters containing a review of digital design:• Combinational (memoryless) circuits in Chapter 1• Sequential circuits (with memory) in Chapter 2
Topics in This Chapter
2.1 Latches, Flip-Flops, and Registers
2.2 Finite-State Machines
2.3 Designing Sequential Circuits
2.4 Useful Sequential Parts
2.5 Programmable Sequential Parts
2.6 Clocks and Timing of Events
Computer Architecture, Background and Motivation Slide 25
2.1 Latches, Flip-Flops, and Registers
Figure 2.1 Latches, flip-flops, and registers.
R Q
Q S
D
Q
Q C
Q
Q
D
C
(a) SR latch (b) D latch
Q
C
Q
D
Q
C
Q
D
(e) k -bit register (d) D flip-flop symbol (c) Master-slave D flip-flop
Q
C
Q
D FF
/
/
k
k
Q
C
Q
D FF
R
S
Computer Architecture, Background and Motivation Slide 26
Latches vs Flip-Flops
Figure 2.2 Operations of D latch and negative-edge-triggered D flip-flop.
D
C
D latch: Q
D FF: Q
Setup time
Setup time
Hold time
Hold time
Computer Architecture, Background and Motivation Slide 27
Reading and Modifying FFs in the Same Cycle
Figure 2.3 Register-to-register operation with edge-triggered flip-flops.
/
/
k
k
Q
C
Q
D FF
/
/
k
k
Q
C
Q
D FF
Computation module (combinational logic)
Clock Propagation delay
Computer Architecture, Background and Motivation Slide 28
2.2 Finite-State MachinesExample 2.1
Figure 2.4 State table and state diagram for a vending machine coin reception unit.
Dime Dime Quarter
Dime
Quarter
Dime Quarter
Dime Quarter
Reset Reset
Reset
Reset
Reset
Start Quarter
S 00
S 10
S 20
S 25
S 30
S 35
S 10 S 25 S 00
S 00
S 00
S 00
S 00
S 00
S 20 S 35
S 35 S 35
S 35 S 35
S 35 S 30
S 35 S 35
------- Input ------- D
ime
Qua
rter
Res
et
Current state
S 00 S 35
is the initial state is the final state
Next state
Dime Quarter
S 00
S 10 S 20
S 25
S 30 S 35
Computer Architecture, Background and Motivation Slide 29
Sequential Machine Implementation
Figure 2.5 Hardware realization of Moore and Mealy sequential machines.
Next-state logic
State register / n
/ m
/ l
Inputs Outputs
Next-state excitation signals
Present state
Output logic
Only for Mealy machine
Computer Architecture, Background and Motivation Slide 30
2.3 Designing Sequential Circuits
Example 2.3
Figure 2.7 Hardware realization of a coin reception unit (Example 2.3).
Output
Q C
Q
D
e
Inputs
Q C
Q
D
Q C
Q
D
FF2
FF1
FF0
q
d
Quarter in
Dime in
Final state is 1xx
Computer Architecture, Background and Motivation Slide 31
2.4 Useful Sequential Parts
High-level building blocks
Much like prefab closets used in building a house
Other memory components will be covered in Chapter 17 (SRAM details, DRAM, Flash)
Here we cover three useful parts: shift register, register file (SRAM basics), counter
Computer Architecture, Background and Motivation Slide 32
Shift Register
Figure 2.8 Register with single-bit left shift and parallel load capabilities. For logical left shift, serial data in line is connected to 0.
Parallel data in / k
/ k
/ k
Shift
Q C
Q
D
FF
1
0
Serial data in
/
k – 1 LSBs
Load
Parallel data out
Serial data out MSB
Computer Architecture, Background and Motivation Slide 33
Register File and FIFO
Figure 2.9 Register file with random access and FIFO.
Dec
oder
/ k
/ k
/
h
Write enable
Read address 0
Read address 1
Read data 0
Write data
Read enable
2 k -bit registers h / k
/ k
/ k
/ k
/ k
/ k
/ k
/ h
Write address
Muxes
Read data 1
/
k
/
h
/
h /
h
/
k /
h
Write enable
Read addr 0
/
k
/
k
Read addr 1
Write data Write addr
Read data 0
Read enable
Read data 1
(a) Register file with random access
(b) Graphic symbol for register file
Q C
Q
D
FF
/ k
Q C
Q
D
FF
Q C
Q
D
FF
Q C
Q
D
FF
/
k
Push
/
k
Input
Output Pop
Full
Empty
(c) FIFO symbol
Computer Architecture, Background and Motivation Slide 34
SRAM
Figure 2.10 SRAM memory is simply a large, single-port register file.
Column mux
Row
dec
ode
r
/ h
Address
Square or almost square memory matrix
Row buffer
Row
Column
g bits data out
/
g /
h
Write enable
/
g
Data in
Address
Data out
Output enable
Chip select
.
.
.
. . .
. . .
(a) SRAM block diagram (b) SRAM read mechanism
Computer Architecture, Background and Motivation Slide 35
Binary Counter
Figure 2.11 Synchronous binary counter with initialization capability.
Count register
Mux
Incrementer
0
Input
Load
IncrInit
x + 1
x
0 1
1 c in c out
Computer Architecture, Background and Motivation Slide 36
2.5 Programmable Sequential Parts
Programmable array logic (PAL)
Field-programmable gate array (FPGA)
Both types contain macrocells and interconnects
A programmable sequential part contain gates and memory elements
Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)
Computer Architecture, Background and Motivation Slide 37
PAL and FPGA
Figure 2.12 Examples of programmable sequential logic.
(a) Portion of PAL with storable output (b) Generic structure of an FPGA
8-input ANDs
D
C Q
Q
FF
Mux
Mux
0 1
0 1
I/O blocks
Configurable logic block
Programmable connections
CLB
CLB
CLB
CLB
Computer Architecture, Background and Motivation Slide 38
Binary Counter
Figure 2.11 Synchronous binary counter with initialization capability.
Count register
Mux
Incrementer
0
Input
Load
IncrInit
x + 1
x
0 1
1 c in c out
Computer Architecture, Background and Motivation Slide 39
2.6 Clocks and Timing of Events
Clock is a periodic signal: clock rate = clock frequencyThe inverse of clock rate is the clock period: 1 GHz 1 ns
Constraint: Clock period tprop + tcomb + tsetup + tskew
Figure 2.13 Determining the required length of the clock period.
Other inputs
Combinational logic
Clock period
FF1 begins to change
FF1 change observed
Must be wide enough to accommodate
worst-case delays
Clock1 Clock2
Q C
Q
D
FF2
Q C
Q
D
FF1
Computer Architecture, Background and Motivation Slide 40
Synchronization
Figure 2.14 Synchronizers are used to prevent timing problems
arising from untimely changes in asynchronous signals.
Asynch input
Asynch input
Synch version
Synch version
Asynch input
Synch version
Clock
(a) Simple synchronizer (b) Two-FF synchronizer
(c) Input and output waveforms
Q
C
Q
D
FF
Q
C
Q
D
FF2
Q
C
Q
D
FF1
Computer Architecture, Background and Motivation Slide 41
Level-Sensitive Operation
Figure 2.15 Two-phase clocking with nonoverlapping clock signals.
Combi- national
logic 1 1
Clock period
Q C
Q
D
Latch
1
Q C
Q
D
Latch
Other inputs
Combi- national
logic 2
2
Clocks with nonoverlapping highs
Other inputs
Q C
Q
Latch
D
Computer Architecture, Background and Motivation Slide 42
3 Computer System Technology Interplay between architecture, hardware, and software
• Architectural innovations influence technology• Technological advances drive changes in architecture
Topics in This Chapter
3.1 From Components to Applications
3.2 Computer Systems and Their Parts
3.3 Generations of Progress
3.4 Processor and Memory Technologies
3.5 Peripherals, I/O, and Communications
3.6 Software Systems and Applications
Computer Architecture, Background and Motivation Slide 43
3.1 From Components to Applications
Figure 3.1 Subfields or views in computer system engineering.
High-level view
Com
put
er d
esig
ner
Circ
uit
desi
gne
r
App
licat
ion
des
igne
r
Sys
tem
des
igne
r
Log
ic d
esig
ner
Software
Hardware
Computer organization
Low-level view
App
licat
ion
dom
ains
Ele
ctro
nic
com
pon
ents
Computer architecture
Computer Architecture, Background and Motivation Slide 44
What Is (Computer) Architecture?
Figure 3.2 Like a building architect, whose place at the engineering/arts and goals/means interfaces is seen in this diagram, a
computer architect reconciles many conflicting or competing demands.
Architect Interface
Interface
Goals
Means
Arts Engineering
Client’s taste: mood, style, . . .
Client’s requirements: function, cost, . . .
The world of arts: aesthetics, trends, . . .
Construction technology: material, codes, . . .
Computer Architecture, Background and Motivation Slide 45
3.2 Computer Systems and Their Parts
Figure 3.3 The space of computer systems, with what we normally mean by the word “computer” highlighted.
Computer
Analog
Fixed-function Stored-program
Electronic Nonelectronic
General-purpose Special-purpose
Number cruncher Data manipulator
Digital
Computer Architecture, Background and Motivation Slide 46
Price/Performance Pyramid
Figure 3.4 Classifying computers by computational
power and price range.
Embedded Personal
Workstation
Server
Mainframe
Super $Millions $100s Ks
$10s Ks
$1000s
$100s
$10s
Differences in scale, not in substance
Computer Architecture, Background and Motivation Slide 47
Automotive Embedded Computers
Figure 3.5 Embedded computers are ubiquitous, yet invisible. They
are found in our automobiles, appliances, and many other places.
Engine
Impact sensors
Navigation & entertainment
Central controller
Brakes Airbags
Computer Architecture, Background and Motivation Slide 48
Personal Computers and Workstations
Figure 3.6 Notebooks, a common class of portable computers, are much smaller than desktops but offer substantially the same capabilities. What are the main reasons for the size difference?
Computer Architecture, Background and Motivation Slide 49
Digital Computer Subsystems
Figure 3.7 The (three, four, five, or) six main units of a digital computer. Usually, the link unit (a simple bus or a more elaborate network) is not explicitly included in such diagrams.
Memory
Link Input/Output
To/from network
Processor
Control
Datapath
Input
Output
CPU I/O
Computer Architecture, Background and Motivation Slide 50
3.3 Generations of ProgressTable 3.2 The 5 generations of digital computers, and their ancestors.
Generation (begun)
Processor technology
Memory innovations
I/O devices introduced
Dominant look & fell
0 (1600s) (Electro-) mechanical
Wheel, card Lever, dial, punched card
Factory equipment
1 (1950s) Vacuum tube Magnetic drum
Paper tape, magnetic tape
Hall-size cabinet
2 (1960s) Transistor Magnetic core Drum, printer, text terminal
Room-size mainframe
3 (1970s) SSI/MSI RAM/ROM chip
Disk, keyboard, video monitor
Desk-size mini
4 (1980s) LSI/VLSI SRAM/DRAM Network, CD, mouse,sound
Desktop/ laptop micro
5 (1990s) ULSI/GSI/ WSI, SOC
SDRAM, flash Sensor/actuator, point/click
Invisible, embedded
Computer Architecture, Background and Motivation Slide 51
Figure 3.8 The manufacturing process for an IC part.
IC Production and Yield
15-30 cm
30-60 cm
Silicon crystal ingot
Slicer Processing: 20-30 steps
Blank wafer with defects
x x x x x x x
x x x x
0.2 cm
Patterned wafer
(100s of simple or scores of complex processors)
Dicer Die
~1 cm
Good die
~1 cm
Die tester
Microchip or other part
Mounting Part
tester Usable
part to ship
Computer Architecture, Background and Motivation Slide 52
Figure 3.9 Visualizing the dramatic decrease in yield with larger dies.
Effect of Die Size on Yield
120 dies, 109 good 26 dies, 15 good
Die yield =def (number of good dies) / (total number of dies)
Die yield = Wafer yield [1 + (Defect density Die area) / a]–a
Die cost = (cost of wafer) / (total number of dies die yield) = (cost of wafer) (die area / wafer area) / (die yield)
Computer Architecture, Background and Motivation Slide 53
3.4 Processor and Memory Technologies
Figure 3.11 Packaging of processor, memory, and other components.
PC board
Backplane
Memory
CPU
Bus
Connector
(b) 3D packaging of the future (a) 2D or 2.5D packaging now common
Stacked layers glued together
Interlayer connections deposited on the
outside of the stack Die
Computer Architecture, Background and Motivation Slide 54
Figure 3.10 Trends in processor performance and DRAM memory chip capacity (Moore’s law).
Moore’s Law
1Mb
1990 1980 2000 2010 kIPS
MIPS
GIPS
TIPS
Pro
cess
or
pe
rfo
rma
nce
Calendar year
80286 68000
80386
80486
68040 Pentium
Pentium II R10000
1.6 / yr
10 / 5 yrs 2 / 18 mos
64Mb
4Mb
64kb
256kb
256Mb
1Gb
16Mb
4 / 3 yrs
Processor
Memory
kb
Mb
Gb
Tb
Me
mo
ry c
hip
ca
pa
city
Computer Architecture, Background and Motivation Slide 55
3.5 Input/Output and Communications
Figure 3.12 Magnetic and optical disk memory units.
(a) Cutaway view of a hard disk drive (b) Some removable storage media
Typically 2-9 cm
Floppy disk
CD-ROM
Magnetic tape
cartridge
. .
. . . . . .
Computer Architecture, Background and Motivation Slide 56
Figure 3.13 Latency and bandwidth characteristics of different classes of communication links.
Communication Technologies
3
6
9
12
9 6 3 3
Ba
ndw
idth
(b
/s)
Latency (s)
10
10
10
10
10 10 10 1 10
Processor
bus
I/O
network
System-area
network (SAN)
Local-area
network (LAN)
Metro-area
network (MAN)
Wide-area
network (WAN)
Geographically distributed
Same geographic location
(ns) (s) (ms) (min) (h)
Computer Architecture, Background and Motivation Slide 57
3.6 Software Systems and Applications
Figure 3.15 Categorization of software, with examples in each class.
Software
Application: word processor,
spreadsheet, circuit simulator,
. . . Operating system Translator:
MIPS assembler, C compiler,
. . .
System
Manager: virtual memory,
security, file system,
. . .
Coordinator: scheduling,
load balancing, diagnostics,
. . .
Enabler: disk driver,
display driver, printing,
. . .
Computer Architecture, Background and Motivation Slide 58
Figure 3.14 Models and abstractions in programming.
High- vs Low-Level Programming
Co
mp
iler
Ass
em
ble
r
Inte
rpre
ter
temp=v[i] v[i]=v[i+1] v[i+1]=temp
Swap v[i] and v[i+1]
add $2,$5,$5 add $2,$2,$2 add $2,$4,$2 lw $15,0($2) lw $16,4($2) sw $16,0($2) sw $15,4($2) jr $31
00a51020 00421020 00821020 8c620000 8cf20004 acf20000 ac620004 03e00008
Very high-level language objectives or tasks
High-level language statements
Assembly language instructions, mnemonic
Machine language instructions, binary (hex)
One task = many statements
One statement = several instructions
Mostly one-to-one
More abstract, machine-independent; easier to write, read, debug, or maintain
More concrete, machine-specific, error-prone; harder to write, read, debug, or maintain
Computer Architecture, Background and Motivation Slide 59
4 Computer PerformancePerformance is key in design decisions; also cost and power
• It has been a driving force for innovation• Isn’t quite the same as speed (higher clock rate)
Topics in This Chapter
4.1 Cost, Performance, and Cost/Performance
4.2 Defining Computer Performance
4.3 Performance Enhancement and Amdahl’s Law
4.4 Performance Measurement vs Modeling
4.5 Reporting Computer Performance
4.6 The Quest for Higher Performance
Computer Architecture, Background and Motivation Slide 60
The Vanishing Computer Cost
1980 1960 2000 2020 $1
Co
mp
ute
r co
st
Calendar year
$1 K
$1 M
$1 G
Computer Architecture, Background and Motivation Slide 61
Figure 4.1 Performance improvement as a function of cost.
Cost/Performance
Performance
Cost
Superlinear: economy of scale
Sublinear: diminishing returns
Linear (ideal?)
Computer Architecture, Background and Motivation Slide 62
4.2 Defining Computer Performance
Figure 4.2 Pipeline analogy shows that imbalance between processing power and I/O capabilities leads to a performance bottleneck.
Processing Input Output
CPU-bound task
I/O-bound task
Computer Architecture, Background and Motivation Slide 63
4.4 Performance Measurement vs. Modeling
Figure 4.5 Running times of six programs on three machines.
Execution time
Program
A E F B C D
Machine 1
Machine 2
Machine 3
Computer Architecture, Background and Motivation Slide 64
4.5 Reporting Computer Performance
Table 4.4 Measured or estimated execution times for three programs.
Time on machine X
Time on machine Y
Speedup of Y over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
All 3 prog’s 2520 450 5.6
Analogy: If a car is driven to a city 100 km away at 100 km/hr and returns at 50 km/hr, the average speed is not (100 + 50) / 2 but is obtained from the fact that it travels 200 km in 3 hours.
Computer Architecture, Background and Motivation Slide 65
Table 4.4 Measured or estimated execution times for three programs.
Time on machine X
Time on machine Y
Speedup of Y over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
Geometric mean does not yield a measure of overall speedup, but provides an indicator that at least moves in the right direction
Comparing the Overall Performance
Speedup of X over Y
10
0.1
0.1
Arithmetic meanGeometric mean
6.72.15
3.40.46
Computer Architecture, Background and Motivation Slide 66
4.6 The Quest for Higher PerformanceState of available computing power ca. the early 2000s:
Gigaflops on the desktopTeraflops in the supercomputer centerPetaflops on the drawing board
Note on terminology (see Table 3.1)
Prefixes for large units:Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015
For memory:K = 210 = 1024, M = 220, G = 230, T = 240, P = 250
Prefixes for small units:micro = 106, nano = 109, pico = 1012, femto = 1015
Computer Architecture, Background and Motivation Slide 67
Figure 4.7 Exponential growth of supercomputer performance.
Supercom-puters
1990 1980 2000 2010
Sup
erc
om
put
er
pe
rfo
rma
nce
Calendar year
Cray X-MP
Y-MP
CM-2
MFLOPS
GFLOPS
TFLOPS
PFLOPS
Vector supercomputers
CM-5
CM-5
$240M MPPs
$30M MPPs
Massively parallel processors