® 1 exponential challenges, exponential rewards— the future of moore’s law based on lecture of...
TRANSCRIPT
RR
® 1
Exponential Challenges, Exponential Rewards—
The Future of Moore’s Law
Based on lecture of Shekhar Borkar
Intel Fellow
Circuit Research, Intel Labs
3
Goal: 1TIPS by 2010
1970 1975 1980 1985 1990 1995 2000 2005 20100.01
0.10
1.00
10.00
100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
MIP
S
Pentium® Pro Architecture
Pentium® 4 Architecture
Pentium® Architecture
486386
2868086
How do you get there?How do you get there?
4
Transistors Scaling
Will high K happen? Would you count on it?Will high K happen? Would you count on it?
5
Technology ScalingGATE
SOURCE
BODY
DRAIN
Xj
ToxD
GATE
SOURCE DRAIN
Leff
BODY
Dimensions scale down by 30%
Doubles transistor density
Oxide thickness scales down
Faster transistor, higher performance
Vdd & Vt scaling Lower active power
Technology has scaled well, will it in the future?Technology has scaled well, will it in the future?
6
Gate Oxide is Near Limit
70 nm
Si3N4
CoSi2130nm Transistor
Will high K happen? Would you count on it?Will high K happen? Would you count on it?
GATE
SOURCE
BODY
DRAINTox
GATE
SOURCE DRAIN
70 nm BODY
8
Transistor Integration Capacity
10 7 5 3 2 1.5 1 0.7 0.50.35
0.25
0.18
0.13
0.09
0.065
0.045
0.001
0.01
0.1
1
10
100
1000
Technology (m)
Tra
ns
isto
rs (
Mil
lio
n) 1 Billion
On track for 1billion transistor integration capacityOn track for 1billion transistor integration capacity
9
35 Years of Microprocessor Trend
C Moore, Data Processing in ExaScale-Class Computer Systems, Salishan, April 2011C Moore, Data Processing in ExaScale-Class Computer Systems, Salishan, April 2011
15
Is Transistor a Good Switch?
On
I = ∞
I = 0
Off
I = 0
I = 0
I ≠ 0
I = 1ma/u
I ≠ 0
I ≠ 0Sub-threshold Leakage
16
Sub-threshold Leakage
Sub-threshold leakage increases exponentiallySub-threshold leakage increases exponentially
30 40 50 60 70 80 90 100 110 120 1301
10
100
1000
10000
Temp (C)
Ioff
(n
a/u
)
0.25u
45nm
Assume:
0.25mm, Ioff = 1na/m5X increase each generation at 30ºC
17
Leakage Power
1.5 1 0.7 0.5 0.35 0.25 0.18 0.13 0.09 0.065
0.045
0%
10%
20%
30%
40%
50%
Technology (m)
Le
ak
ag
e P
ow
er
(% o
f T
ota
l)
Must stopat 50%
Leakage power limits Vt scalingLeakage power limits Vt scaling
A. Grove, IEDM 2002
18
The Power Crisis
0.25u 0.18u 0.13u 90nm 65nm 45nm0
200
400
600
800
1000
1200
Leakage
Active
Po
we
r (W
)
15 mm Die
19
How Power Should Have Scaled
A. Danowitz et al. CPU DB: Recording Microprocessor History. ACMQueue Processors, vol. 10, issue 4, pp1-18. 2012A. Danowitz et al. CPU DB: Recording Microprocessor History. ACMQueue Processors, vol. 10, issue 4, pp1-18. 2012
21
Impact on Path Delays
Path Delay
Path delay variability due to technological variationsImpacts individual circuit performance and power
Optimize each circuit for performance and powerOptimize each circuit for performance and power
Delay
Pro
bab
ility
Due to variations in:Vdd, Vt, and Temp
22
Impact on Path Delays
Path Delay
Path delay variability due to technological variationsImpacts individual circuit performance and power
Optimize each circuit for performance and powerOptimize each circuit for performance and power
Delay
Pro
bab
ility
Due to variations in:Vdd, Vt, and Temp
How many silicon atoms (111pm) have on transistor channel (20nm)? 3D transistor is a solution?
23
Shift in Design ParadigmShift in Design ParadigmFrom deterministic design to
probabilistic and statistical design–A path delay estimate is probabilistic (not
deterministic)
Multi-variable design optimization for– Parameter variations– Active and leakage power– Performance
25
Exponential Costs
1960 1970 1980 1990 2000 2010$10
$100
$1,000
$10,000
$100,000
Lit
ho
To
ol
Co
st (
$K)
G. MooreISSCC 03
Litho Cost
$1
$10
$100
$1,000
$10,000
1960 1970 1980 1990 2000 2010
Fab
Co
st (
$M)
www.icknowledge.com
FAB Cost
1965 1970 1975 1980 1985 1990 1995 2000 20051E-06
1E-05
1E-04
1E-03
1E-02
1E-01
$/T
ran
sist
or
$ per Transistor
1965197019751980198519901995200020051E-02
1E-01
1E+00
1E+01
1E+02
1E+03
1E+04
$/M
IPs
$ per MIPS
26
Some ImplicationsTox scaling will
slow down—may stop?
Vdd scaling will slow down—may stop?
Vt scaling will slow down—may stop?
Approaching constant Vdd scaling
Energy/logic op will not scale
10
7 5 3 2 1.5
1 0.7
0.5
0.35
0.25
0.18
0.13
0.09
0.065
0.045
0.1
1
10
100
Technology (m)
Vd
d (
Vo
lts
)
~1 Volt
10 7 5 3 2 1.5
1 0.7
0.5
0.35
0.25
0.18
0.13
0.09
0.065
0.045
1E-081E-071E-061E-051E-041E-031E-021E-011E+00
Technology (m)
Ene
rgy/
Logi
c O
pera
tion
(N
orm
aliz
ed)
Slow Down?
27
The Terascale Dilemma
Many billion transistor integration capacity will be available– But could be unusable due to power
Logic transistor growth will slow down
Transistor performance will be limitedSolutionsLow power design techniques Improve design efficiency
29
Platform Requirements
0
500
1000
1500
2000
2500
3000
PC tower Mini tower -m tower Slim line Small pcSys
tem
Vo
lum
e (
cub
ic i
nch
)
Shrinking volume
Quieter
Yet, High Performance
0
0.5
1.0
1.5
0 50 100 150 200Power (W)
Th
erm
al
Bu
dg
et
(oC
/W)
0
25
50
75
Hea
t-S
ink
Vo
lum
e (
in3)
Projected Heat Dissipatio
n Volume
Projected Air Flow Rate
Pentium ® III
100
250
Thermal Budget Air
Flo
w R
ate
(C
FM
)
Pentium ® 4
Thermal budget decreasing
Higher heat sink volume
Higher air flow rate
30
Active Power Reduction
Slow Fast Slow
Lo
w S
up
ply
V
olt
ag
e
Hig
h S
up
ply
V
olt
ag
e
Logic BlockFreq = 1Vdd = 1Throughput = 1Power = 1Area = 1 Pwr Den = 1
Vdd
Logic Block
Freq = 0.5Vdd = 0.5Throughput = 1Power = 0.25Area = 2Pwr Den = 0.125
Vdd/2
Logic Block
Multiple Vdd
Throughput oriented design
31
Design & mArch Efficiency
S-Scalar Dynamic Deep Pipe-line
0
1
2
3
4
Die AreaPerformancePower
Gro
wth
(X
) fr
om
pre
vio
us
uA
rch
Same Process Technology
S-Scalar Dynamic Deep Pipe-line
0%
20%
40%
Re
du
cti
on
in
MIP
S/W
att Same Process Technology
Enegry efficiency drops ~20%
Employ efficient design & mArchitecturesEmploy efficient design & mArchitectures
Improve mArch Efficiency
ST Wait for Mem
MT1 Wait for Mem
MT2 Wait
MT3
Single Thread
Multi-Threading
Thermals & Power Delivery designed for full HW utilization
Multi-threading improves performance without impacting thermals & power delivery
Multi-threading improves performance without impacting thermals & power delivery
Computer Architecture: A Quantitative Approach (Hennessy;Patterson, 2011)
Computer Architecture: A Quantitative Approach (Hennessy;Patterson, 2011)
33
Increase on-die Memory
0.7m 0.5m 0.35m 0.25m 0.18m 0.13m 0.10m0%
20%
40%
60%
80%
100%
PentiumPentium ProPentium II Pentium III
Pentium III & 4
Pentium ® 4
Cache % of full chip area
?
Large on die memory provides:
1. Increased Data Bandwidth & Reduced Latency
2. Hence, higher performance for much lower power
0.25m 0.18m 0.13m 0.1m1
10
100
Logic
Memory
Po
we
r D
en
sit
y (
Wa
tts
/cm
2)
34
Chip Multi-Processing
Keynote presentation (L. Benini, RSP 2010).Keynote presentation (L. Benini, RSP 2010).
35
Chip Multi-Processing
1 1.5 2 2.5 3 3.5 41
1.5
2
2.5
3
3.5
Die Area, PowerR
elat
ive
Per
form
ance
CMP
ST
C1 C2
C3 C4
Cache
• Multi-core, each core Multi-threaded• Shared cache and front side bus• Each core has different Vdd & Freq• Spreading hot spots• Lower junction temperature
44
What the Cores Will look like?
• Intelligent redistribution workload
• Improvement of energy efficiency
• Multiple functionalities
50
The Exponential Reward
1970 1975 1980 1985 1990 1995 2000 2005 20100.01
0.1
1
10
100
1000
10000
100000
1000000
MIP
S
Speculative, OOO
Era of Instruction
LevelParallelism
Super Scalar
486386
2868086 Era of
PipelinedArchitecture
Multi ThreadedEra of
Thread &Processor
LevelParallelism
Special Purpose HW
Multi-Threaded, Multi-Core
51
Summary—Delaying Forever
Terascale transistor integration capacity will be available - Power and Energy are the barriers
Variations will be even more prominent - shift from Deterministic to Probabilistic design
Improve design efficiencyExploit integration capacity to deliver
performance in power/cost envelope
52
1. Discuta um problema associados a integração dos dispositivos
2. Comente a afirmação: - “A redução do tamanho dos transistores muda o paradigma de avaliação de consumo de energia e tempo de execução de determinístico para probabilístico”
3. Porque o consumo de energia estático é tão problemático para as tecnologias futuras?
4. Porque a redução da voltagem é um dos principais elementos a tratar para reduzir o consumo de energia?
5. Como um sistema com várias alimentações pode contribuir para a redução do consumo de energia? Qual o efeito sobre o tempo de execução?
Exercícios
53
6. Faça uma ilustração que mostre como um programa multi-thread pode ocupar melhor os recursos de um sistema, reduzindo o gargalo de comunicação com a memória
7. Qual o motivo do percentual de memória interno a um circuito integrado passar de 50% nos processadores atuais?
8. Dada a limitação do escalamento, o que pode ser feito para continuar o crescente aumento do desempenho das máquinas?
9. Quais as tendências em termos de computação (cores), infra-estrutura de comunicação e armazenamento para os próximos processadores?
Exercícios