clock, power consumption and the future landscape of ... · ① lowering the power consumption...
TRANSCRIPT
![Page 1: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/1.jpg)
Clock, Power Consumption and the Future Landscape of Computation
Prof. Usagi
![Page 2: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/2.jpg)
• The number of transistors we can build in a fixed area of silicon doubles every 12 ~ 24 months.
2
Recap: Moore’s Law
(1) Moore, G. E. (1965), 'Cramming more components onto integrated circuits', Electronics 38 (8) .
(1)Tr
ansis
tor C
ount
110
1001,000
10,000100,000
1,000,00010,000,000
100,000,0001,000,000,000
10,000,000,000
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Moore’s Law is the most important driver for
historic CPU performance gains
![Page 3: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/3.jpg)
Recap: Pipelining a 4-bit serial adder
3
add a, b add c, d add e, f add g, h add i, j add k, l add m, n add o, p add q, r add s, t add u, v
1st 2nd 1st
3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 2nd 1st
4th 3rd 4th 2nd 3rd 4th
t
After this point, we are completing an add operation each cycle!
CyclesAdd
= 1
![Page 4: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/4.jpg)
Recap: The growth of clock rate is slowing down
4
![Page 5: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/5.jpg)
• What are the basic limits of clock frequency? • New limit on clock frequency: Power consumption • Opportunities and the future
5
Outline
![Page 6: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/6.jpg)
Timing constraints
6
![Page 7: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/7.jpg)
• Min delay of FF, also called contamination delay or min CLK to Q delay: tccq • Time after clock edge that Q might be unstable (i.e., starts changing)
• Max delay of FF, also called propagation delay or maximum CLK to Q delay: tpcq • Time after clock edge that the output Q is guaranteed to be stable (i.e. stops
changing)
7
Output Timing Constraints
D Flip-flop
DD Q
CLKtpcq
tccq
CLK
Q
![Page 8: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/8.jpg)
• Setup time: tsetup • Time before the clock edge that data must be stable (i.e. not change) • For the FF to capture the input
• Hold time: thold • Time after the clock edge that data must be stable • For the FF to output/store/propagate the data
• Aperture time: ta • Time around clock edge that data must be stable (ta = tsetup + thold)
8
Setup and hold times for a flip-flop
tholdtsetup
ta
![Page 9: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/9.jpg)
• Combinational: • Maximum delay = Propagation delay (tpd) • Minimum delay = Contamination delay (tcd)
• Flip Flops: • Input
• Setup time (tsetup) • Hold time (thold)
• Output • Propagation clock-to-Q time (tpcq) • Contamination clock-to-Q time (tccq)
9
Summary on timing constraints
Once the logic/FFs are built, these timing characteristics are fixed properties
R1
CLK
R2Combinational Logic
D1 Q1 D2 Q2
![Page 10: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/10.jpg)
Timing in a circuit
10
R1
CLK
R2Combinational Logic
C1iQ1i
Q1i-1 C1i-1
Q1i C1i-1
Q1i+1 Q1i+1 C1i+1 C1i
![Page 11: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/11.jpg)
• Input to a FF comes from the output of another FF through a combinational circuit
• The FF and combinational circuit have a min & max delay • Which of the following violations occurs if max delay of R1 is zero &
max delay of the combinational circuit is equal to the clock period? A. Hold time violation for R2 B. Setup violation for R2 C. Hold time violation for R1 D. Setup violation for R1 E. None of the above
11
Causes of Timing Issues in Sequential Circuits
R1
CLK
R2Combinational Logic
Poll close in
![Page 12: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/12.jpg)
• Input to a FF comes from the output of another FF through a combinational circuit
• The FF and combinational circuit have a min & max delay • Which of the following violations occurs if max delay of R1 is zero &
max delay of the combinational circuit is equal to the clock period? A. Hold time violation for R2 B. Setup violation for R2 C. Hold time violation for R1 D. Setup violation for R1 E. None of the above
12
Causes of Timing Issues in Sequential Circuits
R1
CLK
R2Combinational Logic
Tc ≥ tsetup + max_delay(FF) + max_delay(combinational)Tc ≥ tsetup + tpcq + tpd
=tc=0
![Page 13: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/13.jpg)
• Input to a FF comes from the output of another FF through a combinational circuit
• The FF and combinational circuit have a min & max delay • Which of the following violations occurs if min delay of R1 is
zero & max delay of the combinational circuit was just a wire? A. Hold time violation for R2 B. Setup violation for R2 C. Hold time violation for R1 D. Setup violation for R1 E. None of the above
13
Causes of Timing Issues in Sequential Circuits (2)
R1
CLK
R2Combinational Logic
Poll close in
![Page 14: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/14.jpg)
• Input to a FF comes from the output of another FF through a combinational circuit
• The FF and combinational circuit have a min & max delay • Which of the following violations occurs if min delay of R1 is
zero & max delay of the combinational circuit was just a wire? A. Hold time violation for R2 B. Setup violation for R2 C. Hold time violation for R1 D. Setup violation for R1 E. None of the above
14
Causes of Timing Issues in Sequential Circuits (2)
R1
CLK
R2Combinational Logic
Thold ≤ min_delay(FF) + min_delay(combinational)Thold ≤ tccq + tcd=0 =0
![Page 15: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/15.jpg)
Timing analysis
15CLK
X’
A
B
C
D
X
YY’
Flip flops
tccq 30 ps
tpcq 50 ps
tsetup 60 ps
thold 70 ps
Gates
tpd 35 ps
tcd 25 ps
tpd = 35*3 = 105 ps
tcd = 25 ps
Tc ≥ tpcq + tpd + tsetup + tskewTc ≥ 50 + 105 + 60 + 0 = 215ps
Setup time constraints
Hold time constraintstccq + tcd > thold
30ps + 25ps > tholdthold = 70 ps!
No!!!
![Page 16: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/16.jpg)
Timing analysis
16CLK
X’
A
B
C
D
X
YY’
Flip flops
tccq 30 ps
tpcq 50 ps
tsetup 60 ps
thold 70 ps
Gates
tpd 35 ps
tcd 25 ps
tpd = 35*3 = 105 ps
tcd = 25 ps
Tc ≥ tpcq + tpd + tsetupTc ≥ 50 + 105 + 60 + 0 = 215ps
Setup time constraints
Hold time constraintstccq + tcd > thold
30ps + 25ps + 25ps > tholdBuffers
+ 25 ps
Max frequency = 1/215 ps = 4.65 GHz!
![Page 17: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/17.jpg)
• What’s the maximum frequency?
A. 1 / 110ns B. 1 / 220ns C. 1 / 200ns D. 1 / 180ns E. None of the above
17
Example: timing constraints
CLK
A
B
C
D
E
Flip flops
tccq 10 ns
tpcq 70 ns
tsetup 20 ns
thold 30 ns
tpd tcdAND 20 ns 10 ns
NOT 10 ns 10 ns
XOR 110 ns 50 ns
Poll close in
Tc ≥ tpcq + tpd + tsetuptccq + tcd > thold
![Page 18: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/18.jpg)
• What’s the maximum frequency?
A. 1 / 110ns B. 1 / 220ns C. 1 / 200ns D. 1 / 180ns E. None of the above
18
Example: timing constraints
CLK
A
B
C
D
E
tpd tcdAND 20 ns 10 ns
NOT 10 ns 10 ns
XOR 110 ns 50 ns
tpd = 110 ns + 20ns = 130 ns
Tc ≥ tpcq + tpd + tsetupTc ≥ 70ns + 130ns + 20ns + 0
Flip flops
tccq 10 ns
tpcq 70 ns
tsetup 20 ns
thold 30 ns
![Page 19: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/19.jpg)
• Once a flip flop has been built, its timing characteristics stay fixed: tsetup , thold , tccq , tpcq
• What about the clock? Does the clock edge arrive at the same time to all the D-FFs on the chip?
19
FF Timing Parameters
R1
CLK
R2Combinational Logic
D1 Q1 D2 Q2
![Page 20: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/20.jpg)
20
![Page 21: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/21.jpg)
• The clock doesn’t arrive at all registers at the same time • Skew: difference between the two clock edges • Perform the worst case analysis
21
Clock Skew
R1
CLK
R2Combinational Logic
D1 Q1 D2 Q2
The wire has its own delay!!!CLK R1
CLK R2
tskew
![Page 22: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/22.jpg)
• In the worst case, CLK2 is earlier than CLK1 • tpcq is max delay through FF, tpd is max delay through logic
22
Setup Time Constraint with Skew
R1
CLK
R2Combinational Logic
D1 Q1 D2 Q2
Tc ≥ tpcq + tpd + tsetup + tskew
tpd ≤ Tc − (tsetup + tpcq + tskew)CLK R1
CLK R2
tpcq
Q1
D2
tpd tsetup tskew
The larger the design, the longer the tskew
![Page 23: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/23.jpg)
Power consumption
23
![Page 24: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/24.jpg)
• Regarding power and energy, how many of the following statements are correct? ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the heat generation ③ Lowering the energy consumption helps reducing the electricity bill ④ A CPU with 10% utilization can still consume 33% of the peak power A. 0 B. 1 C. 2 D. 3 E. 4
24
Power & EnergyPoll close in
![Page 25: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/25.jpg)
• Power is the direct contributor of “heat” • Packaging of the chip • Heat dissipation cost • Power = PDynamic + Pstatic
• Energy = P * ET • The electricity bill and battery life is related to energy! • Lower power does not necessary means better battery life if the
processor slow down the application too much
25
Power v.s. Energy
![Page 26: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/26.jpg)
• Regarding power and energy, how many of the following statements are correct? ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the heat generation ③ Lowering the energy consumption helps reducing the electricity bill ④ A CPU with 10% utilization can still consume 33% of the peak power A. 0 B. 1 C. 2 D. 3 E. 4
26
Power & Energy
![Page 27: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/27.jpg)
• The power consumption due to the switching of transistor states
• Dynamic power per transistor
• α: average switches per cycle • C: capacitance • V: voltage • f: frequency, usually linear with V • N: the number of transistors
27
Dynamic/Active Power
Pdynamic ∼ α × C × V2 × f × N
![Page 28: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/28.jpg)
• The power consumption due to leakage — transistors do not turn all the way off during no operation
• Becomes the dominant factor in the most advanced process technologies.
• N: number of transistors • V: voltage • Vt: threshold voltage where
transistor conducts (begins to switch)
28
Static/Leakage Power
Pleakage ∼ N × V × e−Vt
![Page 29: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/29.jpg)
• Regarding power and energy, how many of the following statements are correct? ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the heat generation ③ Lowering the energy consumption helps reducing the electricity bill ④ A CPU with 10% utilization can still consume 33% of the peak power A. 0 B. 1 C. 2 D. 3 E. 4
29
Power & Energy
![Page 30: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/30.jpg)
• The power consumption due to the switching of transistor states
• Dynamic power per transistor
• α: average switches per cycle • C: capacitance • V: voltage • f: frequency, usually linear with V • N: the number of transistors
30
Dynamic/Active Power
Pdynamic ∼ α × C × V2 × f × N
![Page 31: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/31.jpg)
• Given a scaling factor S
31
Dennardian Broken
Parameter Relation Classical Scaling Leakage LimitedPower Budget 1 1
Chip Size 1 1Vdd (Supply Voltage) 1/S 1
Vt (Threshold Voltage) 1/S 1/S 1tex (oxide thickness) 1/S 1/S
W, L (transistor dimensions) 1/S 1/SCgate (gate capacitance) WL/tox 1/S 1/SIsat (saturation current) WVdd/tox 1/S 1
F (device frequency) Isat/(CgateVdd) S SD (Device/Area) 1/(WL) S2 S2
p (device power) IsatVdd 1/S2 1P (chip power) Dp 1 S2
U (utilization) 1/P 1 1/S2
![Page 32: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/32.jpg)
• If we are able to cram more transistors within the same chip area (Moore’s law continues), but the power consumption per transistor remains the same. Right now, if put more transistors in the same area because the technology allows us to. How many of the following statements are true? ① The power consumption per chip will increase ② The power density of the chip will increase ③ Given the same power budget, we may not able to power on all chip area if we maintain the
same clock rate ④ Given the same power budget, we may have to lower the clock rate of circuits to power on all
chip area A. 0 B. 1 C. 2 D. 3 E. 4
32
What happens if power doesn’t scale with process technologies?Poll close in
![Page 33: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/33.jpg)
Power consumption
33
Chip Chip0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
Chip1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1
Dennardian Scaling Dennardian Broken
=49W =50W =100W!
![Page 34: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/34.jpg)
Power density
34
Chip Chip0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
Chip1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1
Dennardian Scaling Dennardian Broken
= 49WChip Area = 50W
Chip Area= 100W
Chip Area
![Page 35: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/35.jpg)
Power density
35https://www.cadalyst.com/hardware/workstation-performance-tomorrow039s-possibilities-viewpoint-column-6351
![Page 36: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/36.jpg)
• If we are able to cram more transistors within the same chip area (Moore’s law continues), but the power consumption per transistor remains the same. Right now, if put more transistors in the same area because the technology allows us to. How many of the following statements are true? ① The power consumption per chip will increase ② The power density of the chip will increase ③ Given the same power budget, we may not able to power on all chip area if we maintain the
same clock rate ④ Given the same power budget, we may have to lower the clock rate of circuits to power on all
chip area A. 0 B. 1 C. 2 D. 3 E. 4
36
What happens if power doesn’t scale with process technologies?
![Page 37: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/37.jpg)
Power consumption to light on all transistors
37
Chip Chip0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
Chip1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1
=49W =50W =100W!
Dennardian Scaling Dennardian Broken
On ~ 50W
Off ~ 0W
Dark!
If we can only cool down 50W in the same area —
![Page 38: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/38.jpg)
• Your power consumption goes up as the number of transistors goes up
• Even Moore’s Law allows us to put more transistors within the same area —we cannot use them all simultaneously!
• We have no choice to not activate all transistors at the same time!
38
Dark silicon
![Page 39: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/39.jpg)
• If we are able to cram more transistors within the same chip area (Moore’s law continues), but the power consumption per transistor remains the same. Right now, if put more transistors in the same area because the technology allows us to. How many of the following statements are true? ① The power consumption per chip will increase ② The power density of the chip will increase ③ Given the same power budget, we may not able to power on all chip area if we maintain the
same clock rate ④ Given the same power budget, we may have to lower the clock rate of circuits to power on all
chip area A. 0 B. 1 C. 2 D. 3 E. 4
39
What happens if power doesn’t scale with process technologies?
![Page 40: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/40.jpg)
• The power consumption due to the switching of transistor states
• Dynamic power per transistor
• α: average switches per cycle • C: capacitance • V: voltage • f: frequency, usually linear with V • N: the number of transistors
40
Dynamic/Active Power
Pdynamic ∼ α × C × V2 × f × N
![Page 41: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/41.jpg)
• If we are able to cram more transistors within the same chip area (Moore’s law continues), but the power consumption per transistor remains the same. Right now, if put more transistors in the same area because the technology allows us to. How many of the following statements are true? ① The power consumption per chip will increase ② The power density of the chip will increase ③ Given the same power budget, we may not able to power on all chip area if we maintain the
same clock rate ④ Given the same power budget, we may have to lower the clock rate of circuits to power on all
chip area A. 0 B. 1 C. 2 D. 3 E. 4
41
What happens if power doesn’t scale with process technologies?
![Page 42: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/42.jpg)
Solutions/trends in dark silicon era
42
![Page 43: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/43.jpg)
• Aggressive dynamic voltage/frequency scaling • Throughout oriented — slower, but more • Just let it dark — activate part of circuits, but not all • From general-purpose to domain-specific — ASIC
43
Trends in the Dark Silicon Era
![Page 44: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/44.jpg)
Aggressive dynamic frequency scaling
44
![Page 45: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/45.jpg)
• The power consumption due to the switching of transistor states
• Dynamic power per transistor
• α: average switches per cycle • C: capacitance • V: voltage • f: frequency, usually linear with V • N: the number of transistors
45
Dynamic/Active Power
Pdynamic ∼ α × C × V2 × f × N
![Page 46: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/46.jpg)
Frequency varies per core
46
![Page 47: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/47.jpg)
• You may use cat /proc/cpuinfo to see all the details of your processors
• You may add “| grep MHz” to see the frequencies of your cores • Only very few of them are on the boosted frequency
47
Demo
![Page 48: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/48.jpg)
Slower, but more
48
![Page 49: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/49.jpg)
More cores per chip, slower per core
49
![Page 50: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/50.jpg)
50
An Overview of Kepler GK110 and GK210 Architecture Kepler GK110 was built first and foremost for Tesla, and its goal was to be the highest performing
parallel computing microprocessor in the world. GK110 not only greatly exceeds the raw compute
horsepower delivered by previous generation GPUs, but it does so efficiently, consuming significantly
less power and generating much less heat output.
GK110 and GK210 are both designed to provide fast double precision computing performance to
accelerate professional HPC compute workloads; this is a key difference from the NVIDIA Maxwell GPU
architecture, which is designed primarily for fast graphics performance and single precision consumer
compute tasks. While the Maxwell architecture performs double precision calculations at rate of 1/32
that of single precision calculations, the GK110 and GK210 Kepler-based GPUs are capable of performing
double precision calculations at a rate of up to 1/3 of single precision compute performance.
Full Kepler GK110 and GK210 implementations include 15 SMX units and six 64-bit memory controllers. Different products will use different configurations. For example, some products may deploy 13 or 14
SMXs. Key features of the architecture that will be discussed below in more depth include:
x The new SMX processor architecture
x An enhanced memory subsystem, offering additional caching capabilities, more bandwidth at
each level of the hierarchy, and a fully redesigned and substantially faster DRAM I/O
implementation.
x Hardware support throughout the design to enable new programming model capabilities
x GK210 expands upon GK110’s on-chip resources, doubling the available register file and shared
memory capacities per SMX.
SMX (Streaming Multiprocessor)
Thread scheduler
GPU global
memory
High-bandwidth memory
controllers
The rise of GPU
![Page 51: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/51.jpg)
51
Streaming Multiprocessor (SMX) Architecture
The Kepler GK110/GK210 SMX unit features several architectural innovations that make it the most powerful multiprocessor we’ve built for double precision compute workloads.
SMX: 192 single-precision CUDA cores, 64 double-precision units, 32 special function units (SFU), and 32 load/store units (LD/ST).
Each of these performs the same operation, but each of these is also a
“thread” A total of 16*12 = 192 cores!
![Page 52: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/52.jpg)
ARM’s big.LITTLE architecture
52
![Page 53: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/53.jpg)
Just let it dark
53
![Page 54: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/54.jpg)
NVIDIA’s Turing Architecture
54
![Page 55: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/55.jpg)
Programming in Turing Architecture
55
cublasErrCheck(cublasSetMathMode(cublasHandle, CUBLAS_TENSOR_OP_MATH));
convertFp32ToFp16 <<< (MATRIX_M * MATRIX_K + 255) / 256, 256 >>> (a_fp16, a_fp32, MATRIX_M * MATRIX_K); convertFp32ToFp16 <<< (MATRIX_K * MATRIX_N + 255) / 256, 256 >>> (b_fp16, b_fp32, MATRIX_K * MATRIX_N);
cublasErrCheck(cublasGemmEx(cublasHandle, CUBLAS_OP_N, CUBLAS_OP_N, MATRIX_M, MATRIX_N, MATRIX_K, &alpha, a_fp16, CUDA_R_16F, MATRIX_M, b_fp16, CUDA_R_16F, MATRIX_K, &beta, c_cublas, CUDA_R_32F, MATRIX_M, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP));
Use tensor cores
Make them 16-bit
call Gemm
![Page 56: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/56.jpg)
NVIDIA’s Turing Architecture
56
You can only use either type of these ALUs, but not all of them
![Page 57: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/57.jpg)
The rise of ASICs
57
![Page 58: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/58.jpg)
58
![Page 59: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/59.jpg)
59
![Page 60: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/60.jpg)
–Prof. Usagi
There is no pure “software” or “hardware” design in the dark silicon era. Everything needs to be hardware/software co-designed.
60
![Page 61: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/61.jpg)
• iEval — Capture your screenshot, submit through iLearn and you will receive a full credit assignment
• Assignment 6 due 6/4 • Lab 6 due this Friday • Please fill out ABET survey through iLearn • Final exam will be held during the campus scheduled period to avoid
conflicts • Final review — 6/4 during the lecture, will also release the sample final • 6/11 11:30am — 2:59:59pm • About the same format as midterm, but longer • Will have a final review on 6/6 to help you prepare
61
Announcement
![Page 62: Clock, Power Consumption and the Future Landscape of ... · ① Lowering the power consumption helps extending the battery life ② Lowering the power consumption helps reducing the](https://reader034.vdocuments.us/reader034/viewer/2022050515/5f9f9b75e872bf1a336baea0/html5/thumbnails/62.jpg)
つづく
ElectricalComputerEngineering
Science 120A