ece5461: low power soc designcontents.kocw.net/kocw/document/2014/sungkyunkwan/hanta... ·...
TRANSCRIPT
ECE5461:Low Power SoC Design
Tae Hee Han: [email protected]
Semiconductor Systems Engineering
Sungkyunkwan University
2
Low Power SRAM Issue
Role of Memory in ICs
n Memory is very important
n Focus in this lecture is embedded memory
n Percentage of area going to memory is increasing
3
Introduction
n SRAM is the most common embedded-memory option for CMOS ICsn As the supply voltage of low power ICs decreases, it must remain
compatible with the operating conditions
n At the same time, increasingly parallel architectures demand more on-chip cache (or embedded SRAM array) to effectively share information across parallel processing units
n Achieving low-voltage operation in SRAM faces a challenges, originating from process variation, and related to bit cell stability, sensing, architecture, and efficient CAD methodologies
4
Processor Area Becoming Memory Dominated
n On chip SRAM contains 50-90% of total transistor countn Xeon: 48M/110M
n Itanium 2: 144M/220M
n SRAM is a major source of chip static power dissipationn Dominant in ultra-low power
applications
n Substantial fraction in others
SRAM
Intel Penryn™(Picture courtesy of Intel)
5
§ Process variations increase with scaling
§ Large number of cells requires analysis of tails (out to 6σ or 7σ)
§ Within-die VTH variation due to Random Dopant Fluctuations (RDFs)
§ Process variations increase with scaling
§ Large number of cells requires analysis of tails (out to 6σ or 7σ)
§ Within-die VTH variation due to Random Dopant Fluctuations (RDFs)
SRAM Metrics
n Functionalityn Data retention
n Readability
n Writability
n Soft Errors
n Area
n Power
6
Why is functionality a “metric”?
Where Does SRAM Power Go?
n Numerous analytical SRAM power models
n Great variety in power breakdowns
n Different applications cause different components of power to dominate
n Hence: Depends on applications: e.g. high speed versus low power, portable
7
SRAM cell
n Three tasks of a cell
n Hold datan WL=0; BLs=X
n Writen WL=1; BLs driven with new
data
n Readn WL=1; BLs precharged and
left floating
BL BLWL
M1M2
M3
M4M5
M6Q
QB
8
b b
wordline
Traditional 6-Transistor (6T) SRAM cell
Key SRAM cell metrics
n Key functionality metricsn Hold
n Static Noise Margin (SNM)
n Data retention voltage (DRV)
n Readn Static Noise Margin (SNM)
n Writen Write Margin
Metrics:Area is primary constraint
Next: Power, Delay
Traditional 6-Transistor (6T) SRAM cell
BL BLWL
M1M2
M3
M4M5
M6Q
QB
9
Static Noise Margin (SNM)
SNM gives a measure of the cell’s stability by quantifying the DC noise required to flip the cell
SNM is length of side of the largest embedded square on the butterfly curve
SNM is length of side of the largest embedded square on the butterfly curve
VTC for Inv 2VTC-1 for Inv 1VTC for Inv2 with VN = SNMVTC-1 for Inv1 with VN = SNM
SNM
0.150 0.30
0.3
0.15
QB(
V)
Q (V)
Inv 2Inv 1
BLBBL WL
Q QB
VN
VN
M3
M1
M2M6
M4
M5
10
* VTC: Voltage Transfer Curve
Static Noise Margin with Scaling
n Typical cell SNM deteriorates with scaling
n Variations lead to failure from insufficient SNM
Tech and VDD scaling lower SNM
Variations worsen tail of SNM distribution
(Results obtained from simulations with Predictive Technology Models –[Ref: PTM; Y. Cao ‘00])
11
Variability: Write Margin
Dominant fight (ratioed)
Cell stabilityprior to write: Successful write:
Negative “SNM”
Write failure:Positive SNM
WL BLBBL
0 1 01
Normalized Q
Nor
mal
ized
QB
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Normalized Q
Nor
mal
ized
QB
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Normalized Q
Nor
mal
ized
QB
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
12
Variability: Cell Writability
Write margin limits VDD scaling for 6T cells to 600mV, best case.
n 65nm process, VDD = 0.6V
n Variability and large number of cells makes this worse
Write Fails
Temperature (oC)
SNM
(V)
VDD=0.6V
0
0.05
-0.05
-0.1
-0.15
-0.2
-0.25-40 -20 0 20 40 60 80 100 120
TTWWSSWSSW
13
Cell Array Power
n Leakage Power dominates while the memory holds data
BL BLWL
‘1’‘0’
Sub-threshold leakage
Importance of Gate tunneling and GIDL depends on technology and voltages applied
14
Using Threshold Voltage to Reduce Leakage
n High VTH cells necessary if all else is kept the same
n To keep leakage in 1 MB memory within bounds, VTH must be kept in [0.4, 0.6] range
15
[Ref: K. Itoh, ISCAS’06]
Average extrapolated VTH (V) at 25 ºC -0.2 0 0.2 0.4 0.6 0.8 1.0
100 Lg =0.1 mmW (QT)=0.20 mm W (QD)=0.28 mm W (QL)=0.18 mm
Tj =125 °C100 °C
75 °C50 °C
25 °C
high speed(0.49)
low power(0.71)10 mA
0.1 mA
10-2
10-4
10-6
10-8
1-M
b ar
ray
rete
ntio
n cu
rren
t (A)
Extrapolated VTH =VTH (nA/mm)+0.3 V
Multiple Threshold Voltages
BL BLWL
[Ref: Hamzaoglu, et al., TVLSI’02]
Dual VTH cells with low VTH access transistors provide good tradeoffs in
power and delay
BL BLWL
Use high VTH devices to lower leakage for stored ‘0’, which is much more
common than a stored ‘1’High VTH
Low VTH
‘0’
16
Multiple Voltages
n Selective usage of multiple voltages in cell arrayn e.g. 16 fA/cell at 25°C in 0.13 μm technology
1.0V 1.0VWL=0V § High VTH to lower sub-VTHleakage
§ Raised source, raised VDD, and lower BL reduce gate stress while maintaining SNM
1.5V
0.5V
[Ref: K. Osada, JSSC’03]
17
Power Breakdown During Read
n Accessing correct celln Decoders, WL drivers
n For Lower Power:n hierarchical WLs
n pulsed decoders
n Performing readn Charge and discharge large
BL capacitance
n For Lower Power:
Mem Cell
VDD_Prech
SenseAmp
Address
WL
Data
§ SAs and low BL swing § Lower VDD
§ Hierarchical BLs – May require read assist
§ Lower BL precharge
18
Hierarchical Word-line Architecture
n Reduces amount of switched capacitance
n Saves power and lowers delay[Ref’s: Rabaey, Prentice’03; T. Hirose, JSSC’90]
19
Hierarchical Bitlines
n Divide up bit-lines hierarchicallyn Many variants possible
n Reduce RC delay, also decrease CV2 power
n Lower BL leakage seen by accessed cell
Local BLsGlobal BLs
20
BL Leakage During Read Access
n Leakage into non-accessed cellsn Raises power and delay
n Affects BL differential“1”
“0”
“0”
Bit-
line
21
Bitline Leakage Solutions
“1” “0”
VSSWLVSSWL“1” “0”
VgVGND
Raise VSS in cell (VGND) Negative Wordline (NWL)
n Hierarchical BLsn Raise VSS in celln Negative WL voltagen Longer access FETsn Alternative bit-cellsn Active compensationn Lower BL precharge voltage
n Hierarchical BLsn Raise VSS in celln Negative WL voltagen Longer access FETsn Alternative bit-cellsn Active compensationn Lower BL precharge voltage
[Ref: A. Agarwal, JSSC’03]
22
Lower Precharge Voltage
n Lower BL prechargevoltage decreases power and improves Read SNM
n Internal bit-cell node rises less
n Sharp limit due to accidental cell writing if access FET pulls internal ‘1’ low
23
VDD Scaling
n Lower VDD (and other voltages) via classic voltage scalingn Saves power
n Increases delay
n Limited by lost margin (read and write)
n Recover Read SNM with read assistn Lower BL precharge
n Boosted cell VDD [Ref: Bhavnagarwala’04, Zhang’06]
n Pulsed WL and/or Write-After-Read [Ref: Khellah’06]
n Lower WL [Ref: Ohbayashi’06]
24
Power Breakdown During Write
n Accessing celln Similar to Read
n For Lower Power:n Hierarchical WLs
n Performing writen Traditionally drive BLs full swing
n For Lower Power :n Charge sharing
n Data dependencies
n Low swing BLs with amplification
Address Mem Cell
VDD_Prech
WL
Data
25
Charge recycling to reduce write power
n Share charge between BLs or pairs of BLs
n Saves for consecutive write operations
n Need to assess overhead
BL=0V
BLB=VDD
BL=VDD/2
BLB=VDD/2
BL=VDD
BLB=0V
old values connect floating BLs
disconnect anddrive new values
01 1
[Ref’s: K. Mai, JSSC’98; G. Ming, ASICON’05]
Basic charge recycling – saves 50% power in theory
26
Memory Statistics
n 0’s more commonn SPEC2000: 90% 0s in data
n SPEC2000: 85% 0s in instructions
n Assumed write value using inverted data as necessary [Ref: Y. Chang, ISLPED’99]
n New Bitcell: BL BLWL
WS
WWL
WZ
1R, 1W portW0: WZ=0, WWL=1, WS=1W1: WZ=1, WWL=1, WS=0
[Ref: Y. Chang, TVLSI’04]
27
Low-Swing Write
n Drive the BLs with low swing
n Use amplification in cell to restore values
VDD_Prech
WL
EQ
SLC
WE
VWR=VDD-VTH-delVBL
DinVWR
column decoder
BL BLB
Q QB
[Ref: K. Kanda, JSSC’04]
SLC
WL
EQ
WE
BL/BLB
Q/QB
VDD-VTH-delVBL
VDD-VTH
28
Write Margin
n Fundamental limit to most power-reducing techniques
n Recover write margin with write assist, e.g.n Boosted WL
n Collapsed cell VDD [Itoh’96, Bhavnagarwala’04]
n Raised cell VSS [Yamaoka’04, Kanda’04]
n Cell with amplification [Kanda ’04]
29
Non-traditional cells
n Key tradeoff is with functional robustness
n Use alternative cell to improve robustness, then trade off for power savings
n e.g. Remove read SNM
WBL WBLWWL
RWL
RBL
[Ref: L. Chang, VLSI’05]
• Register file cell• 1R, 1W port• Read SNM eliminated• Allows lower VDD• 30% area overhead• Robust layout
8T SRAM cell
30
Cellss with Pseudo-Static SNM Removal
n Isolate stored data during read
n Dynamic storage for duration of read
[Ref: S. Kosonocky, ISCICT’06] [Ref: K. Takeda, JSSC’06]
BL BLWL
WLW
BL BL
WWL
WLB
WL
Differential read Single-ended read
31
Emerging Devices: Double-gate MOSFET
n Emerging devices allow new SRAM structures
n Back-gate biasing of thin-body MOSFET provides improved control of short-channel effects, and re-instates effective dynamic control of VTH
Drai
n
Sour
ce
Gate
Fin Height HFIN = W/2
Gate length = Lg
Fin Width = TSi
Drai
n
Gate1
Sour
ce
SwitchingGate
Gate2VTH Control
Fin Height HFIN = W
Gate length = Lg
Back-gated (BG) MOSFET• Independent front and back gates• One switching gate and VTH control
gate
Double-gated (DG) MOSFET
[Ref: Z. Guo, ISLPED’05]
32
6T SRAM Cell with Feed-back
00.10.20.30.40.50.60.70.80.9
1
0 0.5 1Vsn1 (V)
Vsn2
(V)
n Double-Gated (DG) NMOS pull-down and PMOS load devices
n Back-Gated (BG) NMOS access devices dynamically increase β-ration SNM during read ~ 300mVn Area penalty ~ 19%
00.10.20.30.40.50.60.70.80.9
1
0 0.5 1
Vsn1 (V)
Vsn2
(V)
6T DG-MOS 6T BG-MOS
33
[Ref: Z. Guo, ISLPED’05]
Summary and Perspectives
n Functionality is main constraint in SRAMn Variation makes the outlying cells limiters
n Look at hold, read, write modes
n Use various methods to improve robustness, then trade off for power savingsn Cell voltages, thresholds
n Novel bit-cells
n Emerging devices
n Embedded memory major threat to continued technology scaling – innovative solutions necessary
34