computer architecture - os3 · 2015-09-10 · bibliography useful texts are: computer architecture...

33
Computer Architecture R. Poss 1 ca01 - 10 september 2015

Upload: others

Post on 14-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Computer ArchitectureR. Poss

1 ca01 - 10 september 2015

Page 2: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Course & organization

2 ca01 - 10 september 2015

Page 3: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Aims of this courseThe aims of this course are:

to highlight current trends

to introduce the notion of hardware/software interface

to introduce caching and parallelism in computer architecture

to introduce simulators & architecture models

3 ca01 - 10 september 2015

Page 4: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

BibliographyUseful texts are:

Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0.

Processor Architecture, Silc, Robic and Ungerer, Springer, ISBN 13-540-64798-8

D. Sima, T. Fountain and P. Kacsuk, Advanced Computer Architecture a Design space approach (Addison-Wesley)

Your own search-fu -- use Google Scholar!

4 ca01 - 10 september 2015

Page 5: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Course overviewSession 1:

1. Trends

2. Hw/Sw interface

3. Pipelining

Session 2:

1. Memory and caching

2. Multi-cores and HMT

5 ca01 - 10 september 2015

Page 6: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

What is ”computer architecture”?

6 ca01 - 10 september 2015

Page 7: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

An engineering domain

SYSTEMS ARCHITECTURESOFTWARE ENGINEERING

ELECTRONIC ENGINEERING COMPUTER ARCHITECTURE

metaloxides

semi-conductors

CMOS

Processors

Storage

metals Links

Embeddedsystems

NMOSmagneticsubstrates

Backplanes

refined matter

petro-chemicalspackaging

FunctionalUnits

Memories

Caches

electronics logic circuits components platforms

layers of composition and complexity: from parts to whole

Softwareprograms

Computingplatforms

Algorithms

Frameworks

Operatingsoftware

systems

Networks

Computationalclusters

Personal computers

Game consoles

GaA/Si/SiGe/SiCcrystals

You are here

The hidden partner activity: compilers, operating systems

7 ca01 - 10 september 2015

Page 8: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Trends

8 ca01 - 10 september 2015

Page 9: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

The Big ChangePower vs chip area:

before: power free, transistors expensive

now: power expensive, transistors cheap

Storage vs computation:

before: computation slow, storage fast

now: storage slow, computation fast

Computation vs storage cost:

before: small storage, ok to compute more to save space (eg compression)

now: large storage, expensive to compute more

9 ca01 - 10 september 2015

Page 10: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Moore’s law

“the number of transistors on integrated circuits doubles

approximately every two years”

10 ca01 - 10 september 2015

Page 11: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Moore’s lawWhy/how:

CMOS: logic based on semiconductor gates in silicon, DRAM: single-gate memory cells

laser photolithography to sculpt gates at atomic scale

Fundamental limits:

can’t make CMOS smaller than atoms in silicon

difficult to increase precision of lasers in manufacturing

Probable evolutions:

number of transistors per unit of area in silicon will stabilize

likely: larger chips + 3D designs with multiple layers of silicon (more area)

11 ca01 - 10 september 2015

Page 12: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Latency lags bandwidth

Latency Bandwidth Transistors

Processors /30 x3000 x10000+

Networks /20 x1000

Memory /4 x200 x100000+

Disks /10 x200

Improvements over ca. 20 years

12 ca01 - 10 september 2015

Page 13: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Latency lags bandwidthMoore’s law helps bandwidth more than latency

More transistors + more pins = more bandwidth

Distance limits latency, storage capacity increases distance

More transistors = relatively longer lines

We will study this later in the context of memories

Market bias: bandwidth easier to sell, so more investment there

13 ca01 - 10 september 2015

Page 14: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Latency lags bandwidthLatency helps bandwidth, but not the other way around

eg: faster disk spin rate: shorter access times, more requests by second

but: more disks in parallel = more bandwidth, same latency

Bandwidth hurts latency

Queues help bandwidth, hurts latency (queuing theory)

adding parallelism actually increases latency (cf later lecture)

14 ca01 - 10 september 2015

Page 15: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Latency lags bandwidthSummary:

For 1 component, bandwidth increases by square of latency decrease

Parallelism allows to scale bandwidth arbitrarily, but keeps latency constant or increases

Similar ratios for performance vs execution time

These trends are there to stay

15 ca01 - 10 september 2015

Page 16: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Trends - “the free lunch is over”

http://www.gotw.ca/publications/concurrency-ddj.htm

This is the story of uniprocessor performance

This is the power wall

This is the memory wall + seq. performance wall

16 ca01 - 10 september 2015

Page 17: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

The Big Change (cont.)A dramatic change in processor chips:

Memory wall: processors much faster than memories

Power wall: can’t power all transistors lest the chip will fry

Sequential performance wall: more transistors don’t help sequential performance any more

This course will suggest how we got here, why these problems are happening and what we can do about it

17 ca01 - 10 september 2015

Page 18: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Principles of Comp. Arch.

18 ca01 - 10 september 2015

Page 19: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Contribution of Comp. Arch.Quantitative principles of design

Take advantage of parallelism

Principle of locality

Focus on the common case

Amdahl’s laws

Careful, quantitative comparison: define, quantify, summarize

Anticipating and exploiting advances in technology

Well-defined interfaces, carefully implemented and thoroughly defined

19 ca01 - 10 september 2015

Page 20: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

ParallelismThree main strategies:

Increase bandwidth and throughput by duplicating storage and data paths

Use pipelining, ie “assembly line”

Perform operations out of order, including simultaneously

Fundamental limits:

pipeline hazards

time and data dependencies = mandatory order

20 ca01 - 10 september 2015

Page 21: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

LocalityPrinciple: individual programs access a relatively small portion of memory in a small amount of time

Two different types:

Temporal locality: if an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse)

Spatial locality: if an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access)

Caches are a fundamental mechanism to take advantage of locality

21 ca01 - 10 september 2015

Page 22: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Memory hierarchy and latency

Registers - on-chip SRAM

L1 cache - on-chip SRAM

L2 cache - on-chip SRAM

off-chip SRAM

L3 cache - off-chip SRAM

Main memory - DRAM

Distributed memory

< 1 1-2 2-6 4-8 ≃10 ≥10 ≥100

< 1 1-6 ≃10 ≥10 ≥100 ≥1000 ≥10000

100MHz clocks

GHz clocks

22-1 ca01 - 10 september 2015

Page 23: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Memory hierarchy and latency

Registers - on-chip SRAM

L1 cache - on-chip SRAM

L2 cache - on-chip SRAM

off-chip SRAM

L3 cache - off-chip SRAM

Main memory - DRAM

Distributed memorySize

< 1 1-2 2-6 4-8 ≃10 ≥10 ≥100

< 1 1-6 ≃10 ≥10 ≥100 ≥1000 ≥10000

100MHz clocks

GHz clocks

22-2 ca01 - 10 september 2015

Page 24: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Memory hierarchy and latency

Registers - on-chip SRAM

L1 cache - on-chip SRAM

L2 cache - on-chip SRAM

off-chip SRAM

L3 cache - off-chip SRAM

Main memory - DRAM

Distributed memorySize

1/cycle time

< 1 1-2 2-6 4-8 ≃10 ≥10 ≥100

< 1 1-6 ≃10 ≥10 ≥100 ≥1000 ≥10000

100MHz clocks

GHz clocks

22-3 ca01 - 10 september 2015

Page 25: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Focus on the common caseIn making a design trade-off, favor the frequent case over the infrequent case

E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st

E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st

Frequent case is often simpler and can be done faster than the infrequent case

What is frequent case and how much performance improved by making case faster => Amdahl’s Law

23 ca01 - 10 september 2015

Page 26: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Amdahl’s law on speedupConsider a computation P which contains two parts A and B in sequence

A can be enhanced (eg more parallelism, more performance); B cannot

T(P) = T(A) + T(B) ( T = time to complete )

Imagine we can accelerate A infinitely so that T(A) becomes 0

Intuitively: overall speedup is limited by T(B)

If the complexity ratio between A and B is P[A/B] (proportion), and A can be accelerated by a factor SA (speedup), Amdalh’s law says:

Soverall = 1 / ( (1 - P[A/B]) + (P[A/B] / SA) )

24 ca01 - 10 september 2015

Page 27: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Amdahl’s law exampleAn algorithm contains a sequential section and a parallel section

The parallel section contains 20% of the computation steps (P=0.2)

The parallel section can be accelerated by a factor N by using N processors/cores

Maximum speedup with N cores = 1 / ( ( 1 - 0.2) + (0.2 / N) )

With N = 100, speedup = 1.24X (100 cores, yet only 24% perf increase!)

This is the fundamental limit to parallelism: to maximize performance gains, need to first increase the proportion of the parallel section.

25 ca01 - 10 september 2015

Page 28: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Amdahl’s law on designA balanced system design should provision 1 bit per second of external bandwidth for each potential instruction per second

Too little external bandwidth: “I/O bound”

Too little instructions/second: “compute-bound”

“Desktop” computers are traditionally I/O bound

Mainframes are usually compute bound

Multi-cores require huge amount of bandwidth to stay balanced

26 ca01 - 10 september 2015

Page 29: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

The Hw/Sw interface

27 ca01 - 10 september 2015

Page 30: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

How do programs run?

General computer model: processor + memory + interconnect + I/O devices

Software is just bits, so is data

How does software translate into behavior? ie. communication, computation and control?

Your take here

28 ca01 - 10 september 2015

Page 31: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

System initializationWhat happens when you switch the computer on?

Define/explain the relationships between:

Reset signal

Initial program counter

Boot ROM

Boot code

Operating system

Start-up storage

CPU

ROM RAM

I/O interface

Disks

29 ca01 - 10 september 2015

Page 32: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Why simulators? Why not native?

This course will talk about the processor(s) in your desktops/laptop machines

But all x86 processors are really RISC “under the hood”

More useful to study RISC to understand the main problems

Also: for your lab assignments you will study low-level architecture behavior under control of assembly code

Easier with a simulator than real hardware!

30 ca01 - 10 september 2015

Page 33: Computer Architecture - OS3 · 2015-09-10 · Bibliography Useful texts are: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0

Summary

What you can take away:

What is computer architecture and why it is important

Some general principles of comp. arch. science

Intro to the hardware/software interface

31 ca01 - 10 september 2015