0906 cmos scaling · 2017-09-05 · 4 cmos computer performance specint 2006 0.01 0.10 1.00 10.00...
TRANSCRIPT
Scaling Power and the Future of CMOS
Mark Horowitz, EE/CS Stanford University
2
A Long Time Ago
In a building far away
A man made a prediction
On surprisingly little data
That has defined an industry
3
Moore’s Law
4
CMOS Computer Performance
Specint 2006
0.01
0.10
1.00
10.00
100.00
88 90 92 94 96 98 00 02 04 0
intel 486intel pent iumintel pent ium 2intel pent ium 3intel pent ium 4intel itaniumAlpha 21064Alpha 21164Alpha 21264SparcSuperSparcSparc64M ipsHP PAPower PCAM D K6AM D K7AM D x86-64IBM PowerSUN UltraSPARCIntel Core 2AM D OpteronAM D Phenom
5
CMOS Computer Performance
Specint 2006
0.01
0.10
1.00
10.00
100.00
88 90 92 94 96 98 00 02 04 06 08 10
intel 486intel pent iumintel pent ium 2intel pent ium 3intel pent ium 4intel itaniumAlpha 21064Alpha 21164Alpha 21264SparcSuperSparcSparc64M ipsHP PAPower PCAM D K6AM D K7AM D x86-64IBM PowerSUN UltraSPARCIntel Core 2AM D OpteronAM D Phenom
6
Moore’s Original Issues
Design costPower dissipationWhat to do with all the functionality possible
ftp://download.intel.com/research/silicon/moorespaper.pdf
7
Scaling MOS Devices
In this ideal scaling• V scales to αV, L scales to αL• So C scales to αC, i scales to αi (i/μ is stable)• Delay = CV/I scales as α• Energy = CV2 scales as α3
JSSC Oct 74, pg 256
8
Processor Power
1
10
100
1000
88 90 92 94 96 98 00 02 04 06 08 10
Watts
9
Power Density
0.01
0.10
1.00
85 87 89 91 93 95 97 99 01 03 05 07 09
Watts/mm2
10
Why Power Increased
Clock Frequency (MHz)
10
100
1000
10000
85 87 89 91 93 95 97 99 01 03 05 07 09
11
Good News
Die growth & super frequency scaling have stopped
Cycle in FO4
10
100
85 87 89 91 93 95 97 99 01 03 05
12
Processor Power
They were high power too
1
10
100
85 87 89 91 93 95 97 99 01 03
13
Bad News
Voltage scaling has stopped as well• kT/q does not scale
• Vth scaling has power consequences
If Vdd does not scale• Energy scales slowly
Ed Nowak, IBM
14
Technology Scaling Today
Device sizes are still scaling• Cost/device is still scaling down• This is what is driving scaling
Voltages are not scaling very fast• Threshold voltages set by leakage• Gate oxide thickness is set by leakage
This means that the channel lengths are not scalingCurrent is increasing by stressing silicon
Now Vdd and Vth are set by optimization
15
Other Technologies
For computing, I am not optimistic
Current problems are set by Physics:• Vdd set by kT/q
Sets the on-off ratio• Wire energy by CVdd2
To get around these limitations• Need to create something very different!
16
Problem with Different Technologies
Design processes have been optimized for silicon• Working on making it better for over 30 years
Silicon has set:• Notions of logic (binary signals), digital design styles• Computing (distinct memory and logic)• Relative size and speed of memory logic
No new technology will fit this mold well• Changing the world is hard• If you build it, generally they don’t come
Unless they absolutely have to
17
Maturing of Silicon
Silicon will not disappear• It will still be a huge business
Growth rate is slower, Eventually very slow scaling
Silicon will become like concrete and steel• Basis of a huge industry• Critical to nearly everything• But fairly stable and predictable
Will remain the dominate substrate for computing• And performance be limited by power dissipation
18
Optimizing Energy
Every design is a point on a 2-D plane
Performance
Ener
gy
19
Optimizing Energy
Every design is a point on a 2-D plane
Performance
Ener
gy
20
Optimizing Energy
Every design is a point on a 2-D plane
Performance
Ener
gy
21
Years of Low Power Research …
Shown only one design technique to reduce power• Reduce waste
Can waste• Energy (clock gating, leakage control, etc)• Performance
Adding additional constraints to operation flow
If technology scaling has stalled• Need to focus on reducing waste in our systems
Increase in efficiency in our designs will set performance
22
Future Systems
Some simple math• Assume scaling continues• Dies don’t shrink in size• Average power/gate must decrease by 2x / gen
Or need to build systems that increase in power
Since gates are shrinking in size• Get 1.4x from capacitive reduction• Where is the other factor of 1.4x ?
23
0.01
0.1
1
1 10 100 1000
Spec2000 *L
Wat
ts/S
pec*
L*Vd
d2̂
intel 386
intel 486
intel pentium
intel pentium 2
intel pentium 3
intel pentium 4
intel i tanium
Alpha 21064
Alpha 21164
Alpha 21264
Spar c
Super Spar c
Spar c64
Mips
HP PA
Power PC
AMD K6
AMD K7
AMD x86-64
The Push for Parallelism
10 parallel processors
24
Exploit Parallelism / Scale Vdd
If you have parallelism• Add more function units
Fill up new die (2x)• Lower energy/op
ΔE/ΔP will decreaseVdd, sizes, etc will reduceBuild simpler architectures
Works well when ΔE/ΔP is large• But what happens when that runs out?
PerformanceEn
ergy
/op
25
Problem Reformulation
Best way to save energy is to do less work• Energy directly reduced by the reduction in work• But required time for the function decreases as well
Convert this into extra power gains• Shifts the optimal curve down and to the right
User Performance
Ener
gy/o
p
26
Exploit Specialization
Optimize execution units for specific applications• Reformulate the hardware to reduce needed work• Can improve energy efficiency for a class of applications
DSP/Vector engines are more efficient than CPUs• Exploit locality, reuse• High compute density
ASICs are more efficient than DSP/Vector engines• If we want efficiency, we need more application optimization
27
ASIC/SOC Design Trends
Rising non-recurring engineering costs• Increasing design complexity
• Growing verification complexity
• Challenging physical design
• Rising mask costs
Few markets can justify ASIC NRE
28
29
ASIC Future Depends on Your Religion
Believe in correct by construction?
Believe in a generic high-level design language?
Historically both have not worked• I believe history is correct
Allowing people to connect complex blocks• Yields a complex validation problem, and a $20M+ design• General SoC, SiP will never be cheap
30
Computing’s Future:
Create a new universal computing platform• That is more efficient that today/tomorrow’s CMP• Bill Dally is working on this one
Leverage existing large volume processors for other applications• GPUs moving into general processing• OMAP being distributed as Unix system
31
Can We Do Better?
Chip design is expensive since chips are complex
But the building blocks are well known• Many of the optimizations are well known too• Designer often do many of the same steps• Part of the reason for off-shoring
Don’t need experience
Getting the system to work is hard• There is a lot of turning the crank that is needed
Can we automate some of the crank turning?
32
Chip Generator Idea
Application
ASIC design Configure a programmable
chip
Configure / program a virtual
chip
ConfiguredSystem
Custom System
Generate optimized chip
Semi-customSystem
NotEfficient
$$$
Process:
Final Product:
33
Another Way To Put It…
Per App
Perf
orm
ance
per W
att
•Best perf’/watt per app’
•Highest costs
•Specific for a single app’
ASIC
•Worse perf’/watt per app’
•Amortized costs
•Wider app’ domain
Program-able
•Excel’ perf’/watt per app’
•Amortized costs
•Wide app’ domain
CMPGenerator
AmortizedCost Structure
34
Processor
T D
T D
D
M
Crossbar
Processor
D T
D M
M
M
D M
M M
Smart Memories - A “Pretend” Generator
Processor
M M
M M
M
M
Crossbar
Processor
M M
M M
M
M
M M
M M
Data Cache
Instruction Cache
FU FU
Smart Memories Architecture: Single Tile Chip Generator Derivative
Looks Promising
Large energy / performance gains are possible:• Use H.264 as example application• Use a GP-CMP chip generator
400-600X initial perf. gap
36
New And Exciting Challenges
Chip Multiprocessor
Generator
Is it useful? How much better than CMP? How
much worse than ASIC?
How do the user use a generator? How would a generator be built? How
is the chip specified?
Given an application and a target (power /
perf.) – How do we find the best “chip plan”?
Verification of a chip is difficult. How do we verify a generator?
Potential Benefits (Wajahat, Rehan, Megan)
Optimization(Omid, Pete)
Process (Ofer)
Verification (Megan, Ofer)
37
Conclusions
The technology engine driving IT is slowing down• Power efficiency is the real problem
Application optimization leads to efficiency• But design is too expensive today to do this
Need to rethink design• Build chip generators not chips
These are virtual programmable chipsHave tools generates the real chips that customers want