1 • trends in microprocessor architecturerdm34/acs-slides/lec1.pdf · 1 • trends in...

35
1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Upload: vuonghanh

Post on 07-Jul-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

1 • Trends in Microprocessor Architecture

R05 Chip Multiprocessors (ACS MPhil)

Robert Mullins

Page 2: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 2

Overview

• Computer architecture• Scaling performance and CMOS

– Where have performance gains come from?– Modern superscalar processors– The limits of superscalar processors

• Going parallel• This course

Page 3: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 3

Computer architecture

“Computer architecture is the interface between what technology can provide and what the marketplace demands”

“Computer architecture is a science of trade-offs”

Yale Patt

Page 4: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 4

Computer architecture

“Computer architecture is the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals” Mark Hill

“Computer architecture forms the bridge between application need and the capabilities of the underlying technology” Tilak Agerwala and Siddhartha Chatterjee

Page 5: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 5

Computer architecture

• We cannot architect a new computer without defining performance, power and cost goals. The design process is all about understanding and making trade-offs

• What is our target market and what applications will we be running?

• The “best” architecture is a moving target– The needs of the marketplace change– Shifting fabrication technology characteristics– New technologies

• memory, packaging, compiler, languages, ...

Page 6: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 6

Computer architecture

“Computer architect's often err by preparing for yesterday's computations” Bill Dally

(Easy to make the same error during a PhD!)

Tomorrow's applications and technologies are not easy to predict!

Page 7: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 7

Historic performance gains

Reproduced from “Computer architecture: A quantitative approach”, Hennessy/Patterson

Page 8: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 8

Historic performance gains

Burger's “end of the roadpaper” suggested performance would be limited to 12.5%/annum

Predicted: 1997-2014 7.4xActual: ~36x

If at historical rate: 1720x

Page 9: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 9

Microprocessor trends

https://github.com/karlrupp/microprocessor-trend-data

Page 10: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 10

Historic performance gains

• Microprocessor performance increased at a rate of ~52%/year between 1986-2002– ~800X improvement over 16 years– How was such an improvement in performance

achieved?– Is this a reasonable rate of performance growth

given the advances in fabrication technology?

Exe. time = Instr. count x CPI x Clock Period

Page 11: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 11

Historic performance gains

• Technology scaling– 7 process generations– Scaling provides

~1.4x transistor performance improvment per generation

– 10.5X – (careful, this doesn't

automatically translate directly into performance gains)

Reproduced with kind permissionof Mark Horowitz

Page 12: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 12

Historic performance gains

• Gates per clock– Less logic between

pipeline registers– Reduction from ~100 to

10 gate delays– 10X

• How?– Pipelining

• 5 to 20 stages (~4X)– Circuit-level advances

• e.g. new logic families• ~2.5X

Reproduced with kind permissionof Mark Horowitz

Page 13: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 13

Historic performance gains

~105X

Reproduced from “CMOS VLSI Design” Weste/Harris (2005)

Page 14: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 14

Historic performance gains

• IPC & instr. count– ~5-8X improvement

in SPECint/MHz– This is despite clock

frequency improvements

– Includes advances in compiler technology and impact of increased bus widths

Improvement in SPECint95/Mhz over timeReproduced with kind permission

of Mark Horowitz

Page 15: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 15

Historic performance gains

• How was it possible to maintain and even decrease CPI (improve IPC)– Moore's law!– How were the additional transistors exploited?

• Intel 386 to Pentium 4– 386: 275K transistors (die size = 43mm2)– P4: 42M transistors (die size = 217mm2)

• 5X from increased die size• 27X from technology scaling

• Today's (2017) largest chips contain > 10 billion transistors

Page 16: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 16

Historic performance gains

Reproduced from CMOS VLSI Design, Weste and Harris (2005)

Page 17: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil)

Moore’s Law

Page 18: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 18

The future of Moore's Law: 2D to 3D

• Beyond 2021 it won't be economically desirable to shrink transistor dimensions

• Recently introducedvertical transistors(e.g. dual-gate and tri-gate)

• Monolithic 3D predicted by 2024

• Roadmap to consider applications in future (more of an end-to-end view vs. bottom-up)

The latest ITRS Roadmap (2015) predicts that physical gate lengthwill not shrink beyond 2021. Earlierpredictions (2013) were moreoptimistic.

Page 19: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 19

Modern superscalar processors

• Revision (See Hennessy/Patterson)– Significant hardware support for Instruction Level

Parallelism (ILP) in most commercial microprocessors

• Multiple-issue architectures • Deep pipelines, branch prediction, speculative execution• Large on-chip caches (L1/L2/L3)• Out-of-order execution, register renaming• Dynamic memory address disambiguation• SIMD instructions• ...

Page 20: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 20

Modern superscalar processors

Page 21: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 21

Limits of superscalar processors

• Cost and complexity of extracting ILP– Diminishing returns– Increased complexity limits ability to optimise

design• The underlying fabrication technology characteristics

are becoming more challenging too– Increases verification complexity and time– Increases time-to-market

Page 22: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 22

Limits of superscalar processors

• Pipeline depth limits– Interruptions to the pipeline (branches)– Performance of the memory system– Clocking overheads (registers/clock skew)– Need to balance stages and maintain the atomicity

of some operations– Limited ILP– Power cost

(See also “Optimal Pipeline Depth” link on Seminar 1 wiki page)

Page 23: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 23

Limits of superscalar processors

"Coming challenges in microarchitecture and architecture", Ronen et al, 2001

Page 24: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 24

Limits of superscalar processors

• Interconnect versus transistor scaling– Smaller transistors = faster/lower power– Wires don't scale in the same way ☹– Centralised structures don't scale well– Pressure to decentralise– Consider bypass network between FUs

• Clustered implementations

Page 25: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 25

Limits of superscalar processors

• Voltage scaling and power limits– Voltage scaling has slowed

• 5V to 1V - gave us 25X power savings• 1V to 0.7V (limit at end of CMOS around 2020)• Only 2X power savings left from voltage scaling!

– Sensible power limits already reached– Pressure to reduce power consumption

• Process variation complications– Fault tolerance requirements in the longer term

Page 26: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 26

Going parallel

• Accept we can make little progress with single-thread performance

• Look towards thread-level parallelism– Achieve our performance gains in a new way:– Rapidly increase the number of cores

• 2X-3X per generation – Don't scale the clock frequency

• Create simpler more power efficient cores instead

Page 27: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 27

Going parallel

Pawlowski (Intel)2007

It is now 2018.....

Numbers of cores has scaledless agressively than this.

In 2017 @ 14nm,

High-end server part:

28 Core, Xeon (Skylake)

56 threads

Clock frequency 2.5GHz

(max turbo freq. 3.8GHz)

TDP (power) = 205 W

Page 28: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 28

Going parallel

• Going parallel is simple?– Replicate existing processor designs to ease

design process– Many applications already exist where thread-level

parallelism is plentiful– We've had 30+ years of experience writing parallel

programs

Page 29: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 29

Going parallel

• Many new challenges:– On-chip and off-chip communication– Simpler cores and Amdahl's law– Power constrained design– Support for the shared-memory paradigm?– Synchronization and thread-scheduling support?– Everyone must now write scalable and correct

parallel programs!

Page 30: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 30

Going parallel

• Power is a first order design constraint– Power consumption is already at a sensible limit

(for many applications we would like to reduce it)– We are going to increase the number of cores by

2-3X per generation• Power savings?

– Core shrink (<1.4X)– Simpler cores (1.4-2X?)– Some VDD savings– Need to add “uncore” logic too!– Techniques for adaptive EPI?

Page 31: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 31

Going parallel

• Beyond homogenous multicore– Power consumption is a limiting factor in the

design of multicore processors– For many designs this has prompted the

integration of many specialized accelerators • An ASIC implementation of an algorithm may be 10-

1000X more energy efficient that a software implementation

• e.g. Apple A8 SoC:– ~50% custom accelerators– ~25% CPUs (2) – ~25% GPU

Page 32: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil)

Future of multicore?

• “NAVIGO”, [Hempstead, Wei and Brooks, 2011]

• Examined throughput orientated workload

• Suggest gains limited to 35% per year due to power constraints

Page 33: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil)

Future gainsNeed for applications to beapproximation/fault tolerant

Node: “2nm/1.5nm”Vertical Gate-All-Around-Device (GAA)Monolithic-3D (stacking of devices) VDD = 0.4V

IRDS Roadmap (2016)

Page 34: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 34

This course

• Introduction to the challenges of building and programming chip multiprocessors– Lots to learn from traditional parallel computers,

but many problems and trade-offs are new• New applications• The trade-offs on-chip are very different to those when

designing physically larger parallel machines• Power and energy constraints• Parallel programming for the masses

Page 35: 1 • Trends in Microprocessor Architecturerdm34/acs-slides/lec1.pdf · 1 • Trends in Microprocessor Architecture R05 Chip Multiprocessors (ACS MPhil) Robert Mullins

Chip Multiprocessors (ACS MPhil) 35

This course

1. Trends in microprocessor architecture 2. Introduction to parallel computing3. Parallel algorithms4. Chip Multiprocessors (I)5. Chip Multiprocessors (II)6. Transactional memory7. On-chip interconnection networks8. Manycore research issues

• 2017: Rune Holm ARM (Machine Learning Group)

• 2016: Gavin Stark, Netronome (CTO)

• 2014: David Moloney, Movidius (CTO)

• 2012: Matt Horsnell, ARM

• 2011: Eben Upton, Broadcom

• ...