® 1 exponential challenges, exponential rewards the future of moores law shekhar borkar intel...
TRANSCRIPT
RR
®® 1
Exponential Challenges, Exponential Challenges, Exponential Rewards—Exponential Rewards—
The Future of Moore’s LawThe Future of Moore’s Law
Shekhar BorkarShekhar Borkar
Intel FellowIntel Fellow
Circuit Research, Intel LabsCircuit Research, Intel Labs
Fall, 2004Fall, 2004
2
ISSCC 2003—ISSCC 2003—Gordon Moore said…Gordon Moore said…
““No exponential is forever…No exponential is forever…
ButBut
We can delay We can delay Forever”Forever”
3
OutlineOutline
The The exponentialexponential challenges challengesCircuit and Circuit and Arch solutionsArch solutionsMajor paradigm shifts in designMajor paradigm shifts in design Integration & SOCIntegration & SOCThe The exponentialexponential reward rewardSummarySummary
4
Goal: 1TIPS by 2010Goal: 1TIPS by 2010
0.01
0.1
1
10
100
1000
10000
100000
1000000
1970 1980 1990 2000 2010
MIP
S
Pentium® Pro Architecture
Pentium® 4 Architecture
Pentium® Architecture
486386
2868086
How do you get there?How do you get there?How do you get there?How do you get there?
5
Technology ScalingTechnology ScalingGATE
SOURCE
BODY
DRAIN
Xj
ToxD
GATE
SOURCE DRAIN
Leff
BODY
Dimensions scale Dimensions scale down by 30%down by 30%
Doubles transistor Doubles transistor densitydensity
Oxide thickness Oxide thickness scales downscales down
Faster transistor, Faster transistor, higher performancehigher performance
Vdd & Vt scalingVdd & Vt scaling Lower active powerLower active power
Technology has scaled well, will it in the future?Technology has scaled well, will it in the future?Technology has scaled well, will it in the future?Technology has scaled well, will it in the future?
6
Technology OutlookTechnology OutlookHigh Volume High Volume ManufacturingManufacturing
20042004 20062006 20082008 20102010 20122012 20142014 20162016 20182018
Technology Node Technology Node (nm)(nm)
9090 6565 4545 3232 2222 1616 1111 88
Integration Integration Capacity (BT)Capacity (BT)
2 4 8 16 32 64 128 256
Delay = CV/I Delay = CV/I scalingscaling
0.70.7 ~0.7~0.7 >0.7>0.7 Delay scaling will slow downDelay scaling will slow down
Energy/Logic Op Energy/Logic Op scalingscaling
>0.35>0.35 >0.5>0.5 >0.5>0.5 Energy scaling will slow downEnergy scaling will slow down
Bulk Planar CMOSBulk Planar CMOS High Probability Low ProbabilityHigh Probability Low Probability
Alternate, 3G etcAlternate, 3G etc Low Probability High ProbabilityLow Probability High Probability
VariabilityVariability Medium High Very HighMedium High Very High
ILD (K)ILD (K) ~3~3 <3<3 Reduce slowly towards 2-2.5Reduce slowly towards 2-2.5
RC DelayRC Delay 11 11 11 11 11 11 11 11
Metal LayersMetal Layers 6-76-7 7-87-8 8-98-9 0.5 to 1 layer per generation0.5 to 1 layer per generation
7
Is Transistor a Good Switch?Is Transistor a Good Switch?
On
I = ∞
I = 0
Off
I = 0
I = 0
I ≠ 0
I = 1ma/u
I ≠ 0
I ≠ 0Sub-threshold Leakage
8
Exponential Challenge #1Exponential Challenge #1
9
Sub-threshold LeakageSub-threshold Leakage
Sub-threshold leakage increases exponentiallySub-threshold leakage increases exponentiallySub-threshold leakage increases exponentiallySub-threshold leakage increases exponentially
1
10
100
1000
10000
30 50 70 90 110 130
Temp (C)
Ioff
(n
a/u
)
0.25u
45nm
Assume:
0.25m, Ioff = 1na/5X increase each generation at 30ºC
10
SD Leakage PowerSD Leakage Power
0.1
1
10
100
1000
0.25u 0.18u 0.13u 90nm 65nm 45nm
Technology
SD
Lea
kag
e (W
atts
)
2X Tr Growth1.5X Tr Growth
SD leakage power becomes prohibitiveSD leakage power becomes prohibitiveSD leakage power becomes prohibitiveSD leakage power becomes prohibitive
11
Leakage PowerLeakage Power
0%
10%
20%
30%
40%
50%
1.5 0.7 0.35 0.18 0.09 0.05
Technology ()
Lea
kag
e P
ow
er(%
of
To
tal)
Must stopat 50%
Leakage power limits Vt scalingLeakage power limits Vt scalingLeakage power limits Vt scalingLeakage power limits Vt scaling
A. Grove, IEDM 2002
12
Exponential Challenge #2Exponential Challenge #2
13
Gate Oxide is Near LimitGate Oxide is Near Limit
High-K dielectric is crucialHigh-K dielectric is crucialHigh-K dielectric is crucialHigh-K dielectric is crucial
1.E-031.E-021.E-011.E+001.E+011.E+021.E+031.E+041.E+051.E+06
0.25u 0.18u 0.13u 90nm 65nm 45nm
Technology
Gat
e L
eaka
ge
(Wat
ts)
1.5X
2X
During Burn-in1.4X Vdd
90nm MOS Transistor
50nm50nm
Silicon substrateSilicon substrate
1.2 nm SiO1.2 nm SiO22
GateGateIf Tox scaling slows down, then Vdd If Tox scaling slows down, then Vdd scaling will have to slow downscaling will have to slow down
14
Exponential Challenge #3Exponential Challenge #3
15
Energy per Logic OperationEnergy per Logic Operation
1.E-08
1.E-07
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
10 5 2 1 0.5 0.25 0.13 0.07
Technology ()
En
erg
y/L
og
ic O
per
atio
n
(No
rmal
ized
)
This was Good
Slow down
Energy per logic operation scaling will slow downEnergy per logic operation scaling will slow downEnergy per logic operation scaling will slow downEnergy per logic operation scaling will slow down
16
The Power CrisisThe Power Crisis
0
200
400
600
800
1000
1200
0.25u 0.18u 0.13u 90nm 65nm 45nm
Po
wer
(W
)
Leakage
Active
15 mm Die
Business as usualBusiness as usual is not an option is not an optionBusiness as usualBusiness as usual is not an option is not an option
17
Exponential Challenge #4Exponential Challenge #4
18
Sources of VariationsSources of Variations
0
50
100
150
200
250
He
at
Flu
x (
W/c
m2
)
Heat Flux (W/cm2)Results in Vcc variation
40
50
60
70
80
90
100
110
Te
mp
era
ture
(C
)
Temperature Variation (°C)Hot spots
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)
Me
an
Nu
mb
er
of
Do
pa
nt
Ato
ms
Random Dopant Fluctuations
0.01
0.1
1
1980 1990 2000 2010 2020
micron
10
100
1000
nm193nm193nm
248nm248nm365nm365nm
LithographyLithographyWavelengthWavelength
65nm65nm90nm90nm
130nm130nm
GenerationGeneration
GapGap
45nm45nm32nm32nm 13nm 13nm
EUVEUV
180nm180nm
Source: Mark Bohr, Intel
Sub-wavelength Lithography
19
Frequency & SD LeakageFrequency & SD Leakage
0.9
1.0
1.1
1.2
1.3
1.4
0 5 10 15 20
Normalized Leakage (Isb)
No
rmal
ized
Fre
qu
ency
0.18 micron~1000 samples
20X30%
Low FreqLow Isb
High FreqMedium Isb
High FreqHigh Isb
20
0
20
40
60
80
100
120
-39.71 -25.27 -10.83 3.61 18.05 32.49
VTn(mv)
# o
f C
hip
s
~30mV
Vt DistributionVt Distribution
0.18 micron~1000 samples
Low FreqLow Isb
High FreqMedium Isb
High FreqHigh Isb
21
Exponential Challenge #5Exponential Challenge #5
22
Platform RequirementsPlatform Requirements
0
500
1000
1500
2000
2500
3000
PC tower Mini towertower Slim line Small pcSys
tem
Vo
lum
e (
cub
ic i
nch
)
Shrinking volumeShrinking volume
QuieterQuieter
Yet, High PerformanceYet, High Performance
0
0.5
1.0
1.5
0 50 100 150 200Power (W)
Th
erm
al
Bu
dg
et
(oC
/W)
0
25
50
75
Hea
t-S
ink
Vo
lum
e (
in3)
Projected Heat Dissipatio
n Volume
Projected Air Flow Rate
Pentium ® III
100
250
Thermal Budget Air
Flo
w R
ate
(C
FM
)
Pentium ® 4
Thermal budget Thermal budget decreasingdecreasing
Higher heat sink volumeHigher heat sink volume
Higher air flow rateHigher air flow rate
23
Exponential Challenge #6Exponential Challenge #6
24
Exponential CostsExponential Costs
$1
$10
$100
$1,000
$10,000
$100,000
1960 1970 1980 1990 2000 2010
Lit
ho
To
ol
Co
st (
$K)
G. MooreISSCC 03
Litho Cost
$1
$10
$100
$1,000
$10,000
1960 1970 1980 1990 2000 2010
Fab
Co
st (
$M)
www.icknowledge.com
FAB Cost
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1960 1970 1980 1990 2000 2010
$/T
ran
sist
or
$ per Transistor
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1960 1970 1980 1990 2000 2010
$/M
IPs
$ per MIPS
25
Product Cost PressureProduct Cost Pressure
Shrinking ASP, and shrinking $ budget for powerShrinking ASP, and shrinking $ budget for powerShrinking ASP, and shrinking $ budget for powerShrinking ASP, and shrinking $ budget for power
0
200
400
600
800
1000
1200
1400
1999 2000 2001 2002 2003 2004 2005
Des
k T
op
AS
P (
$)
Source: IDC
Other
Thermal
$10
Power Delivery
$25
Platform Budget
26
Must Fit in Power EnvelopeMust Fit in Power Envelope
Technology, Circuits, and Technology, Circuits, and Architecture to constrain the power Architecture to constrain the power
Technology, Circuits, and Technology, Circuits, and Architecture to constrain the power Architecture to constrain the power
0
200
400
600
800
1000
1200
1400
90nm 65nm 45nm 32nm 22nm 16nm
Po
wer
(W
), P
ow
er D
ensi
ty (
W/c
m2)
SiO2 Lkg
SD Lkg
Active
10 mm Die
1 MB2 MB
4 MB
8 MB
27
Some ImplicationsSome Implications Tox scaling will slow Tox scaling will slow
down—may stop?down—may stop? Vdd scaling will slow Vdd scaling will slow
down—may stop?down—may stop? Vt scaling will slow Vt scaling will slow
down—may stop?down—may stop? Approaching Approaching
constant Vdd scalingconstant Vdd scaling Energy/logic op will Energy/logic op will
not scalenot scale
0.1
1
10
100
10 3 1 0.35 0.13 0.05
Technology ()
Vd
d (
Vo
lts
)
~1 Volt
1.E-081.E-071.E-061.E-051.E-041.E-031.E-021.E-011.E+00
10 5 2 1 0.5 0.25 0.13 0.07
Technology ()
En
erg
y/L
og
ic O
pe
rati
on
(N
orm
ali
zed
)
Slow Down?
28
The Gigascale DilemmaThe Gigascale Dilemma
1B T integration capacity will be available1B T integration capacity will be available But could be unusable due to powerBut could be unusable due to power Logic T growth will slow downLogic T growth will slow down Transistor performance will be limitedTransistor performance will be limited
SolutionsSolutions Low power design techniquesLow power design techniques Improve design efficiency—Multi everywhereImprove design efficiency—Multi everywhere Valued performance by even higher Valued performance by even higher
integration integration (of potentially slower transistors)(of potentially slower transistors)
29
Power—active and leakagePower—active and leakageVariationsVariationsMicroarchitectureMicroarchitecture
30
Active Power ReductionActive Power Reduction
SlowSlow FastFast SlowSlow
Lo
w S
up
ply
V
olt
age
Hig
h S
up
ply
V
olt
age
Multiple Supply Voltages
Logic BlockLogic BlockFreq = 1Vdd = 1Throughput = 1Power = 1Area = 1 Pwr Den = 1
Vdd
Logic BlockLogic Block
Freq = 0.5Vdd = 0.5Throughput = 1Power = 0.25Area = 2Pwr Den = 0.125
Vdd/2
Logic BlockLogic Block
Replicated Designs
31
Leakage ControlLeakage ControlBody Bias
VddVbp
Vbn-Ve
+Ve
2-10X2-10XReductionReduction
Sleep Transistor
Logic BlockLogic Block
2-1000X2-1000XReductionReduction
Stack Effect
Equal Loading
5-10X5-10XReductionReduction
32
Circuit Design TradeoffsCircuit Design Tradeoffs
0
0.5
1
1.5
2
Low-Vt usagelow high
Higher probability of target frequency with:Higher probability of target frequency with:1.1. Larger transistor sizes Larger transistor sizes 2.2. Higher Low-Vt usageHigher Low-Vt usage
But with power penaltyBut with power penalty
0
0.5
1
1.5
2
Transistor sizesmall large
power
target frequency probability
33
Impact of Critical PathsImpact of Critical Paths
1.1
1.2
1.3
1.4
1 9 17 25
# of critical paths
Mea
n c
lock
fre
qu
en
cy
Clock frequency
Nu
mb
er
of
die
s
0%
20%
40%
60%
0.9 1.1 1.3 1.5
# critical paths
With increasing # of critical pathsWith increasing # of critical paths–Both Both and and become smaller become smaller
–Lower mean frequencyLower mean frequency
34
Impact of Logic DepthImpact of Logic Depth
0%
20%
40%
-16% -8% 0% 8% 16%
Delay
20%
40% NMOSPMOS
Device IONN
um
ber
of
sam
ple
s (%
)
Variation (%)
NMOS Ion
PMOS Ion/
Delay/
5.6% 3.0% 4.2%
Logic depth: 16
0.0
0.5
1.0
Logic depth
Rat
io o
fd
elay
- t
o Io
n-
16 49
35
0
0.5
1
1.5
Logic depthsmalllarge
frequency
target frequency probability # uArch critical paths
0
0.5
1
1.5
less more
Architecture TradeoffsArchitecture Tradeoffs
Higher target frequency with:Higher target frequency with:1.1. Shallow logic depthShallow logic depth2.2. Larger number of critical pathsLarger number of critical paths
But with lower probabilityBut with lower probability
36
Variation-tolerant DesignVariation-tolerant Design
0
0.5
1
1.5
# uArch critical pathsless more
Balance power & frequenc
y with variation tolerance
0
0.5
1
1.5
Logic depthsmalllarge
frequency
target frequency probability
0
0.5
1
1.5
2
Transistor sizesmall large
power
target frequency probability
0
0.5
1
1.5
2
Low-Vt usagelow high
37
Leakage PowerLeakage Power
Fre
qu
en
cyF
req
ue
ncy
DeterministicDeterministic
ProbabilisticProbabilistic10X variation 10X variation
~50% total power~50% total power
Probabilistic DesignProbabilistic Design
DelayDelayPath DelayPath Delay P
rob
abili
tyP
rob
abili
tyDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the future
Due to Due to variations in:variations in:VVdddd, V, Vtt, and , and
TempTemp
Delay TargetDelay Target
# o
f P
ath
s#
of
Pa
ths
DeterministicDeterministic
Delay TargetDelay Target
# o
f P
ath
s#
of
Pa
ths
ProbabilisticProbabilistic
38
Shift in Design ParadigmShift in Design Paradigm
Multi-variable design optimization for:Multi-variable design optimization for:– Yield and bin splits Yield and bin splits
– Parameter variations Parameter variations
– Active and leakage powerActive and leakage power
– Performance Performance
Tomorrow:Tomorrow:Global OptimizationGlobal Optimization
Multi-variateMulti-variate
Today:Today:Local OptimizationLocal Optimization
Single VariableSingle Variable
39
Resistor Network
4.5 mm
5.3
mm
Multiplesubsites PD & Counter Resistor
Network
CUT Bias Amplifier
Delay
Die frequency: Min(F1..F21)
Die power: Sum(P1..P21)
Technology 150nm CMOS
Number of subsites per die
21
Body bias range0.5V FBB to 0.5V RBB
Bias resolution 32 mV
1.6 X 0.24 mm, 21 sites per die150nm CMOS
Adaptive Body Bias--ExperimentAdaptive Body Bias--Experiment
40
Adaptive Body Bias--ResultsAdaptive Body Bias--Results
0%
20%
60%
100%
Ac
cep
ted
die
noBB
100% yield
ABB
Higher Frequency
Nu
mb
er
of
die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
Nu
mb
er
of
die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
97% highest bin
within die ABB
For given Freq and Power densityFor given Freq and Power density
• 100% yield with ABB 100% yield with ABB
• 97% highest freq bin with ABB for 97% highest freq bin with ABB for within die variability within die variability
41
Design & Design & Arch EfficiencyArch Efficiency
0
1
2
3
4
S-Scalar Dynamic DeepPipeline
Gro
wth
(X
) fr
om
pre
vio
us
uA
rch
Die AreaPerformancePower
Same Process Technology
0%
20%
40%
S-Scalar Dynamic DeepPipeline
Red
uct
ion
in
MIP
S/W
att Same Process Technology
Enegry efficiency drops ~20%
Employ efficient design & Employ efficient design & ArchitecturesArchitecturesEmploy efficient design & Employ efficient design & ArchitecturesArchitectures
42
Memory LatencyMemory Latency
MemoryMemoryCPUCPU CacheCache
Small~few Clocks Large
50-100ns1
10
100
1000
100 1000 10000
Freq (MHz)M
em
ory
Late
ncy (
Clo
cks)
Assume: 50ns Memory latency
Cache miss hurts performanceCache miss hurts performanceWorse at higher frequencyWorse at higher frequency
Cache miss hurts performanceCache miss hurts performanceWorse at higher frequencyWorse at higher frequency
43
Increase on-die MemoryIncrease on-die Memory
Large on die memory provides:
1. Increased Data Bandwidth & Reduced Latency
2. Hence, higher performance for much lower power
1
10
100
Po
wer
Den
sity
(W
atts
/cm
2)
Logic
Memory
0%
25%
50%
75%
100%
1u 0.5u 0.25u 0.13u 65nmC
ach
e %
of
To
tal
Are
a
486Pentium®
Pentium® III
Pentium® 4
Pentium® M
44
Multi-threadingMulti-threading
0%
20%
40%
60%
80%
100%
100% 98% 96%
Cache Hit %
Per
form
ance
1 GHz
2 GHz
3 GHz
STST Wait for MemWait for Mem
MT1MT1 Wait for MemWait for Mem
MT2MT2 WaitWait
MT3MT3
Single ThreadSingle Thread
Multi-ThreadingMulti-Threading
Full HW UtilizationFull HW Utilization
Multi-threading improves performance without Multi-threading improves performance without impacting thermals & power deliveryimpacting thermals & power delivery
Multi-threading improves performance without Multi-threading improves performance without impacting thermals & power deliveryimpacting thermals & power delivery
Thermals & Power Delivery Thermals & Power Delivery designed for full HW utilizationdesigned for full HW utilizationThermals & Power Delivery Thermals & Power Delivery designed for full HW utilizationdesigned for full HW utilization
45
Chip Multi-ProcessingChip Multi-Processing
1
1.5
2
2.5
3
3.5
1 2 3 4Die Area, Power
Rel
ativ
e P
erfo
rman
ce
Multi Core
Single Core
C1C1 C2C2
C3C3 C4C4
Cache
• Multi-core, each core Multi-threaded• Shared cache and front side bus• Each core has different Vdd & Freq• Core hopping to spread hot spots• Lower junction temperature
46
Special Purpose HardwareSpecial Purpose HardwareT
CB Exec
Core
PLL
OOO
ROMC
AM
1
TC
B ExecCore
PLL
ROB
ROMC
LB
Inputseq
Sendbuffer
2.23 mm X 3.54 mm, 260K transistors
Opportunities:Network processing enginesMPEG Encode/Decode enginesSpeech engines
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1995 2000 2005 2010 2015
MIP
S
GP MIPS@75W
TOE MIPS@~2W
Special purpose HW—Best Mips/WattSpecial purpose HW—Best Mips/WattSpecial purpose HW—Best Mips/WattSpecial purpose HW—Best Mips/Watt
TCP Offload EngineTCP Offload Engine
47
Valued Performance: SOC Valued Performance: SOC (System on a Chip)(System on a Chip)
Special-purpose hardware Special-purpose hardware more MIPS/mm² more MIPS/mm² SIMD integer and FP instructions in several ISAsSIMD integer and FP instructions in several ISAs
Die AreaDie Area PowerPower PerformancePerformance
General General PurposePurpose 2X2X 2X2X ~1.4X~1.4X
Multimedia Multimedia KernelsKernels <10%<10% <10%<10% 1.5 - 4X1.5 - 4X
Si Monolithic Polylithic
CPUCPU
SpecialHW
SpecialHW
MemoryMemory
CMOS RFCMOS RF
WirelineWireline
RFRF
Opto-Electronics
Opto-Electronics
Dense MemoryDense
Memory
HeterogeneousSi, SiGe, GaAs
48
The Exponential RewardThe Exponential Reward
0.01
0.1
1
10
100
1000
10000
100000
1000000
1970 1980 1990 2000 2010
MIP
S
Multi-everywhere: MT, CMPMulti-everywhere: MT, CMPMulti-everywhere: MT, CMPMulti-everywhere: MT, CMP
Speculative, OOO
Era of Era of Instruction Instruction
LevelLevelParallelismParallelism
Super Scalar
486386
2868086 Era of Era of
PipelinedPipelinedArchitectureArchitecture
Multi ThreadedEra of Era of
Thread &Thread &ProcessorProcessor
LevelLevelParallelismParallelism
Special Special Purpose HWPurpose HW
Multi-Threaded, Multi-Core
49
Summary—Delaying Summary—Delaying ForeverForever
Gigascale transistor integration capacity will Gigascale transistor integration capacity will be available—Power and Energy are the be available—Power and Energy are the barriersbarriers
Variations will be even more prominent—shift Variations will be even more prominent—shift from Deterministic to Probabilistic designfrom Deterministic to Probabilistic design
Improve design efficiencyImprove design efficiency Multi—everywhere, & SOC Multi—everywhere, & SOC valued valued
performanceperformance Exploit integration capacity to deliver Exploit integration capacity to deliver
performance in power/cost envelopeperformance in power/cost envelope