0906 cmos scaling · 2017-09-05 · 4 cmos computer performance specint 2006 0.01 0.10 1.00 10.00...

37
Scaling Power and the Future of CMOS Mark Horowitz, EE/CS Stanford University

Upload: others

Post on 31-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

Scaling Power and the Future of CMOS

Mark Horowitz, EE/CS Stanford University

Page 2: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

2

A Long Time Ago

In a building far away

A man made a prediction

On surprisingly little data

That has defined an industry

Page 3: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

3

Moore’s Law

Page 4: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

4

CMOS Computer Performance

Specint 2006

0.01

0.10

1.00

10.00

100.00

88 90 92 94 96 98 00 02 04 0

intel 486intel pent iumintel pent ium 2intel pent ium 3intel pent ium 4intel itaniumAlpha 21064Alpha 21164Alpha 21264SparcSuperSparcSparc64M ipsHP PAPower PCAM D K6AM D K7AM D x86-64IBM PowerSUN UltraSPARCIntel Core 2AM D OpteronAM D Phenom

Page 5: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

5

CMOS Computer Performance

Specint 2006

0.01

0.10

1.00

10.00

100.00

88 90 92 94 96 98 00 02 04 06 08 10

intel 486intel pent iumintel pent ium 2intel pent ium 3intel pent ium 4intel itaniumAlpha 21064Alpha 21164Alpha 21264SparcSuperSparcSparc64M ipsHP PAPower PCAM D K6AM D K7AM D x86-64IBM PowerSUN UltraSPARCIntel Core 2AM D OpteronAM D Phenom

Page 6: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

6

Moore’s Original Issues

Design costPower dissipationWhat to do with all the functionality possible

ftp://download.intel.com/research/silicon/moorespaper.pdf

Page 7: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

7

Scaling MOS Devices

In this ideal scaling• V scales to αV, L scales to αL• So C scales to αC, i scales to αi (i/μ is stable)• Delay = CV/I scales as α• Energy = CV2 scales as α3

JSSC Oct 74, pg 256

Page 8: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

8

Processor Power

1

10

100

1000

88 90 92 94 96 98 00 02 04 06 08 10

Watts

Page 9: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

9

Power Density

0.01

0.10

1.00

85 87 89 91 93 95 97 99 01 03 05 07 09

Watts/mm2

Page 10: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

10

Why Power Increased

Clock Frequency (MHz)

10

100

1000

10000

85 87 89 91 93 95 97 99 01 03 05 07 09

Page 11: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

11

Good News

Die growth & super frequency scaling have stopped

Cycle in FO4

10

100

85 87 89 91 93 95 97 99 01 03 05

Page 12: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

12

Processor Power

They were high power too

1

10

100

85 87 89 91 93 95 97 99 01 03

Page 13: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

13

Bad News

Voltage scaling has stopped as well• kT/q does not scale

• Vth scaling has power consequences

If Vdd does not scale• Energy scales slowly

Ed Nowak, IBM

Page 14: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

14

Technology Scaling Today

Device sizes are still scaling• Cost/device is still scaling down• This is what is driving scaling

Voltages are not scaling very fast• Threshold voltages set by leakage• Gate oxide thickness is set by leakage

This means that the channel lengths are not scalingCurrent is increasing by stressing silicon

Now Vdd and Vth are set by optimization

Page 15: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

15

Other Technologies

For computing, I am not optimistic

Current problems are set by Physics:• Vdd set by kT/q

Sets the on-off ratio• Wire energy by CVdd2

To get around these limitations• Need to create something very different!

Page 16: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

16

Problem with Different Technologies

Design processes have been optimized for silicon• Working on making it better for over 30 years

Silicon has set:• Notions of logic (binary signals), digital design styles• Computing (distinct memory and logic)• Relative size and speed of memory logic

No new technology will fit this mold well• Changing the world is hard• If you build it, generally they don’t come

Unless they absolutely have to

Page 17: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

17

Maturing of Silicon

Silicon will not disappear• It will still be a huge business

Growth rate is slower, Eventually very slow scaling

Silicon will become like concrete and steel• Basis of a huge industry• Critical to nearly everything• But fairly stable and predictable

Will remain the dominate substrate for computing• And performance be limited by power dissipation

Page 18: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

18

Optimizing Energy

Every design is a point on a 2-D plane

Performance

Ener

gy

Page 19: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

19

Optimizing Energy

Every design is a point on a 2-D plane

Performance

Ener

gy

Page 20: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

20

Optimizing Energy

Every design is a point on a 2-D plane

Performance

Ener

gy

Page 21: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

21

Years of Low Power Research …

Shown only one design technique to reduce power• Reduce waste

Can waste• Energy (clock gating, leakage control, etc)• Performance

Adding additional constraints to operation flow

If technology scaling has stalled• Need to focus on reducing waste in our systems

Increase in efficiency in our designs will set performance

Page 22: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

22

Future Systems

Some simple math• Assume scaling continues• Dies don’t shrink in size• Average power/gate must decrease by 2x / gen

Or need to build systems that increase in power

Since gates are shrinking in size• Get 1.4x from capacitive reduction• Where is the other factor of 1.4x ?

Page 23: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

23

0.01

0.1

1

1 10 100 1000

Spec2000 *L

Wat

ts/S

pec*

L*Vd

d2̂

intel 386

intel 486

intel pentium

intel pentium 2

intel pentium 3

intel pentium 4

intel i tanium

Alpha 21064

Alpha 21164

Alpha 21264

Spar c

Super Spar c

Spar c64

Mips

HP PA

Power PC

AMD K6

AMD K7

AMD x86-64

The Push for Parallelism

10 parallel processors

Page 24: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

24

Exploit Parallelism / Scale Vdd

If you have parallelism• Add more function units

Fill up new die (2x)• Lower energy/op

ΔE/ΔP will decreaseVdd, sizes, etc will reduceBuild simpler architectures

Works well when ΔE/ΔP is large• But what happens when that runs out?

PerformanceEn

ergy

/op

Page 25: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

25

Problem Reformulation

Best way to save energy is to do less work• Energy directly reduced by the reduction in work• But required time for the function decreases as well

Convert this into extra power gains• Shifts the optimal curve down and to the right

User Performance

Ener

gy/o

p

Page 26: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

26

Exploit Specialization

Optimize execution units for specific applications• Reformulate the hardware to reduce needed work• Can improve energy efficiency for a class of applications

DSP/Vector engines are more efficient than CPUs• Exploit locality, reuse• High compute density

ASICs are more efficient than DSP/Vector engines• If we want efficiency, we need more application optimization

Page 27: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

27

ASIC/SOC Design Trends

Rising non-recurring engineering costs• Increasing design complexity

• Growing verification complexity

• Challenging physical design

• Rising mask costs

Few markets can justify ASIC NRE

Page 28: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

28

Page 29: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

29

ASIC Future Depends on Your Religion

Believe in correct by construction?

Believe in a generic high-level design language?

Historically both have not worked• I believe history is correct

Allowing people to connect complex blocks• Yields a complex validation problem, and a $20M+ design• General SoC, SiP will never be cheap

Page 30: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

30

Computing’s Future:

Create a new universal computing platform• That is more efficient that today/tomorrow’s CMP• Bill Dally is working on this one

Leverage existing large volume processors for other applications• GPUs moving into general processing• OMAP being distributed as Unix system

Page 31: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

31

Can We Do Better?

Chip design is expensive since chips are complex

But the building blocks are well known• Many of the optimizations are well known too• Designer often do many of the same steps• Part of the reason for off-shoring

Don’t need experience

Getting the system to work is hard• There is a lot of turning the crank that is needed

Can we automate some of the crank turning?

Page 32: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

32

Chip Generator Idea

Application

ASIC design Configure a programmable

chip

Configure / program a virtual

chip

ConfiguredSystem

Custom System

Generate optimized chip

Semi-customSystem

NotEfficient

$$$

Process:

Final Product:

Page 33: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

33

Another Way To Put It…

Per App

Perf

orm

ance

per W

att

•Best perf’/watt per app’

•Highest costs

•Specific for a single app’

ASIC

•Worse perf’/watt per app’

•Amortized costs

•Wider app’ domain

Program-able

•Excel’ perf’/watt per app’

•Amortized costs

•Wide app’ domain

CMPGenerator

AmortizedCost Structure

Page 34: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

34

Processor

T D

T D

D

M

Crossbar

Processor

D T

D M

M

M

D M

M M

Smart Memories - A “Pretend” Generator

Processor

M M

M M

M

M

Crossbar

Processor

M M

M M

M

M

M M

M M

Data Cache

Instruction Cache

FU FU

Smart Memories Architecture: Single Tile Chip Generator Derivative

Page 35: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

Looks Promising

Large energy / performance gains are possible:• Use H.264 as example application• Use a GP-CMP chip generator

400-600X initial perf. gap

Page 36: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

36

New And Exciting Challenges

Chip Multiprocessor

Generator

Is it useful? How much better than CMP? How

much worse than ASIC?

How do the user use a generator? How would a generator be built? How

is the chip specified?

Given an application and a target (power /

perf.) – How do we find the best “chip plan”?

Verification of a chip is difficult. How do we verify a generator?

Potential Benefits (Wajahat, Rehan, Megan)

Optimization(Omid, Pete)

Process (Ofer)

Verification (Megan, Ofer)

Page 37: 0906 CMOS Scaling · 2017-09-05 · 4 CMOS Computer Performance Specint 2006 0.01 0.10 1.00 10.00 100.00 88 90 92 94 96 98 00 02 04 0 intel 486 intel pentium intel pentium 2 intel

37

Conclusions

The technology engine driving IT is slowing down• Power efficiency is the real problem

Application optimization leads to efficiency• But design is too expensive today to do this

Need to rethink design• Build chip generators not chips

These are virtual programmable chipsHave tools generates the real chips that customers want