lecture25-memorybwrcs.eecs.berkeley.edu/classes/icdesign/ee141_s10/...ee141 1 eecs141ee141 lecture...
TRANSCRIPT
EE141
1
EE141 EECS141 1 Lecture #25
EE141 EECS141 2 Lecture #25
Hw 8 Posted Last one to be graded Due Friday April 30
Hw 6 and 7 Graded and available Project Phase 2 Graded Project Phase 3 Launch Today
EE141
2
EE141 EECS141 3 Lecture #25
0
1
2
3
4
5
6
1.5 2 2.5 3 3.5 4
Frequency [GHz]
Performance Histogram
EE141 EECS141 4 Lecture #25
0
1
2
3
4
1.5 2 2.5 3 3.5 4 Frequency [GHz]
0.0E+00
5.0E-04
1.0E-03
1.5E-03
2.0E-03
2.5E-03
3.0E-03
3.5E-03
0.0E+00 1.0E+09 2.0E+09 3.0E+09
Pow
er [W
]
Frequency [Hz]
Performance - pruned
EE141
3
EE141 EECS141 5 Lecture #25
Max: 98 Avg: 78.9 StDev: 12.9 Median: 81 0
1
2
3
4
5
6
50 60 70 80 90 100
EE141 EECS141 6 Lecture #25
Design implementation of 256-bit code
Create floorplan to estimate wirelength
Optimize circuit such that energy is minimized for max delay of 10 nsec
Determine also Tdmin and Emin for that design
Present results in poster
Energy
Delay
(Td,max, Emin)
(Td,min, Emax)
(Td, E)
EE141
4
EE141 EECS141 7 Lecture #25
Template:
Group A:
Group B: Same but period is 4
Group C: Same but period is 8
Group D: Same but period is 16
EE141 EECS141 8 Lecture #25
!
EE141
5
EE141 EECS141 9 Lecture #25
!
EE141 EECS141 10 Lecture #25
Energy
Delay
10ns
? Joule
Optimization target The extremes
EE141
6
EE141 EECS141 11 Lecture #25
Last lecture Multipliers
Today’s lecture Memory cells
Reading (Ch 12)
EE141 EECS141 12 Lecture #25 12
EE141
7
EE141 EECS141 13 Lecture #25
Read-Write Memory Non-Volatile Read-Write
Memory Read-Only Memory
EPROM E 2 PROM
FLASH
Random Access
Non-Random Access
SRAM DRAM
Mask-Programmed Programmable (PROM)
FIFO
Shift Register CAM
LIFO
EE141 EECS141 14 Lecture #25
STATIC (SRAM)
DYNAMIC (DRAM)
Data stored as long as supply is applied Larger (6 transistors/cell) Fast Differential (usually)
Periodic refresh required Smaller (1-3 transistors/cell) Slower Single Ended
EE141
8
EE141 EECS141 15 Lecture #25
Conceptual: linear array Each box holds some data But this does not lead to a nice layout shape Too long and skinny
Create a 2-D array Decode Row and Column
address to get data
EE141 EECS141 16 Lecture #25
Word 0 Word 1 Word 2
Word N-2 Word N-1
Storage cell
M bits M bits
N words
S 0 S 1 S 2
S N-2
A 0 A 1
A K-1
K = log 2 N S N -1
Word 0 Word 1 Word 2
Word N-2 Word N-1
Storage cell
S 0
Input-Output ( M bits)
Intuitive architecture for N x M memory Too many select signals:
N words == N select signals K = log 2 N Decoder reduces the number of select signals
Input-Output ( M bits)
Decoder
EE141
9
EE141 EECS141 17 Lecture #25
EE141 EECS141 18 Lecture #25
• If D is high, D_b will be driven low • Which makes D stay high
• Positive feedback
EE141
10
EE141 EECS141 19 Lecture #25
EE141 EECS141 20 Lecture #25
Access transistor must be able to overpower the feedback
EE141
11
EE141 EECS141 21 Lecture #25
EE141 EECS141 22 Lecture #25
Complementary data values are written (read) from two sides
EE141
12
EE141 EECS141 23 Lecture #25 EE141
WL
BL
V DD
M 5 M 6
M 4
M 1
M 2
M 3
BL
Q Q
EE141 EECS141 24 Lecture #25
0 1
• Q_b will get pulled up when WL first goes high • Reading the cell should not destroy the stored value
Read
EE141
13
EE141 EECS141 25 Lecture #25 EE141
WL
BL V DD
M 5 M 6
M 4
M 1 V DD V DD V DD
BL Q = 1 Q = 0
C bit C bit
EE141 EECS141 26 Lecture #25 EE141
0 0
0.2 0.4 0.6 0.8
1 1.2
0.5 1 1.2 1.5 2 Cell Ratio (CR)
2.5 3
Volta
ge R
ise
(V)
EE141
14
EE141 EECS141 27 Lecture #25 EE141
BL = 1 BL = 0
Q = 0 Q = 1
M 1
M 4
M 5 M 6
V DD
V DD WL
EE141 EECS141 28 Lecture #25 EE141
EE141
15
EE141 EECS141 29 Lecture #25
SNM
Obtained by breaking the feedback between the inverters
EE141 EECS141 30 Lecture #25
V DD
GND
WL
BL BLB
Compact cell Bitlines: M2 Wordline: bootstrapped in M3
EE141
16
EE141 EECS141 31 Lecture #25
ST/Philips/Motorola
Access Transistor
Pull down Pull up
EE141 EECS141 32 Lecture #25
EE141
17
EE141 EECS141 33 Lecture #25
EE141 EECS141 34 Lecture #25 EE141
Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem
M 3
R L R L V DD
WL
Q Q
M 1 M 2
M 4
BL BL
EE141
18
EE141 EECS141 35 Lecture #25 EE141
No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL -V Tn
WWL BL 1
M 1 X M 3
M 2 C S
BL 2
RWL
V DD V DD - V T
D V V DD - V T BL 2 BL 1 X RWL WWL
EE141 EECS141 36 Lecture #25 EE141
BL2 BL1 GND
RWL
WWL
M3
M2
M1
WWL BL 1
M 1 X M 3
M 2 C S
BL 2
RWL
EE141
19
EE141 EECS141 37 Lecture #25 EE141
Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance
Voltage swing is small; typically around 250 mV.
Δ V BL V PRE – V BIT V PRE – C S C S C BL + ------------ = = V
EE141 EECS141 38 Lecture #25 EE141
Uses Polysilicon-Diffusion Capacitance Expensive in Area
M 1 word line
Diffused bit line
Polysilicon gate
Polysilicon plate
Capacitor
Cross-section Layout
Metal word line
Poly
SiO 2
Field Oxide n + n + Inversion layer induced by plate bias
Poly
EE141
20
EE141 EECS141 39 Lecture #25
EE141 EECS141 40 Lecture #25
Cell Plate Si
Capacitor Insulator
Storage Node Poly
2nd Field Oxide
Refilling Poly
Si Substrate
Trench Cell Stacked-capacitor Cell
Capacitor dielectric layer Cell plate Word line
Insulating Layer
Isolation Transfer gate Storage electrode
EE141
21
EE141 EECS141 41 Lecture #25
WL BL
WL
BL
1 WL
BL
WL BL
WL BL
0
VDD
WL
BL
GND
Diode ROM MOS ROM 1 MOS ROM 2
EE141 EECS141 42 Lecture #25
WL [0] GND
BL [0]
WL [1]
WL [2]
WL [3]
V DD
BL [1]
Pull-up devices
BL [2] BL [3]
GND
EE141
22
EE141 EECS141 43 Lecture #25
Programmming using the Active Layer Only
Polysilicon
Metal1
Diffusion
Metal1 on Diffusion
Cell (9.5λ x 7λ)
EE141 EECS141 44 Lecture #25 44
Polysilicon
Metal1
Diffusion
Metal1 on Diffusion
Cell (11λ x 7λ)
Programmming using the Contact Layer Only
EE141
23
EE141 EECS141 45 Lecture #25
All word lines high by default with exception of selected row
WL [0]
WL [1]
WL [2]
WL [3]
V DD Pull-up devices
BL [3] BL [2] BL [1] BL [0]
EE141 EECS141 46 Lecture #25
No contact to VDD or GND necessary;
Loss in performance compared to NOR ROM drastically reduced cell size
Polysilicon
Diffusion
Metal1 on Diffusion
Cell (8λ x 7λ)
Programmming using the Metal-1 Layer Only
EE141
24
EE141 EECS141 47 Lecture #25 47
Cell (5λ x 6λ)
Polysilicon
Threshold-altering implant
Metal1 on Diffusion
Programmming using Implants Only
EE141 EECS141 48 Lecture #25
Floating gate Source
Substrate
Gate Drain
n + n +_ p
t ox t ox
Device cross-section Schematic symbol
G
S
D
EE141
25
EE141 EECS141 49 Lecture #25
0 V
- 5 V 0 V
D S
Removing programming voltage leaves charge trapped
5 V
- 2.5 V 5 V
D S
Programming results in higher V T .
20 V
10 V 5 V 20 V
D S
Avalanche injection
EE141 EECS141 50 Lecture #25
Floating gate Source
Substrate p
Gate Drain
n 1 n 1
FLOTOX transistor Fowler-Nordheim I-V characteristic
20 – 30 nm
10 nm
-10 V 10 V
I
V GD
EE141
26
EE141 EECS141 51 Lecture #25
WL
BL
V DD
Absolute threshold control is hard Unprogrammed transistor might be depletion 2 transistor cell
EE141 EECS141 52 Lecture #25
EPROM Flash Courtesy Intel
EE141
27
EE141 EECS141 53 Lecture #25
Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry
EE141 EECS141 54 Lecture #25
Collection of 2M complex logic gates Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder
EE141
28
EE141 EECS141 55 Lecture #25
Look at decoder for 256x256 memory block (8KBytes)
EE141 EECS141 56 Lecture #25
Goal: Build fastest possible decoder with static CMOS logic
What we know Basically need 256 AND
gates, each one of them drives one word line
N=8
EE141
29
EE141 EECS141 57 Lecture #25
Each word line has 256 cells connected to it Total output load is 256*Ccell + Cwire
Assume that decoder input capacitance is Caddress=4*Ccell
Each address drives 28/2 AND gates A0 drives ½ of the gates, A0_b the other ½ of the
gates Neglecting Cwire, the fan-out on each one of the
16 address wires is: B
EE141 EECS141 58 Lecture #25
FB of at least 213 means that we will want to use more than log4(213) = 6.5 stages to implement the AND8
Need many stages anyways So what is the best way to implement the AND
gate? Will see next that it’s the one with the most stages
and least complicated gates
EE141
30
EE141 EECS141 59 Lecture #25
LE=10/3 1 ΠLE = 10/3 P = 8 + 1
LE=2 5/3 ΠLE = 10/3 P = 4 + 2
LE=4/3 5/3 4/3 1 ΠLE = 80/27 P = 2 + 2 + 2 + 1
EE141 EECS141 60 Lecture #25
Using 2-input NAND gates 8-input gate takes 6 stages
Total LE is (4/3)3 ≈ 2.4 So PE is 2.4*213 – optimal N of ~7.1
EE141
31
EE141 EECS141 61 Lecture #25
256 8-input AND gates Each built out of
tree of NAND gates and inverters
Issue: Every address line has
to drive 128 gates (and wire) right away
Can’t build gates small enough - Forces us to add buffers just to drive address inputs
EE141 EECS141 62 Lecture #25
EE141
32
EE141 EECS141 63 Lecture #25
Use a single gate for each of the shared terms E.g., from A0, A0, A1, and A1, generate four
signals: A0A1, A0A1, A0A1, A0A1
In other words, we are decoding smaller groups of address bits first And using the “predecoded” outputs to do
the rest of the decoding
EE141 EECS141 64 Lecture #25
EE141
33
EE141 EECS141 65 Lecture #25
• • •
• • •
A 2 A 2
A 2 A 3
WL 0
A 2 A 3 A 2 A 3 A 2 A 3
A 3 A 3 A 0 A 0
A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1
A 1 A 1
WL 1
Multi-stage implementation improves performance
EE141 EECS141 66 Lecture #25
Precharge devices
V DD φ
GND
WL 3 WL 2 WL 1 WL 0
A 0 A 0
GND
A 1 A 1 φ
WL 3
A 0 A 0 A 1 A 1
WL 2
WL 1
WL 0
V DD
V DD
V DD
V DD
2-input NOR decoder 2-input NAND decoder
EE141
34
EE141 EECS141 67 Lecture #25
Advantages: speed (tpd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count
2-input NOR decoder
A 0 S 0
BL 0 BL 1 BL 2 BL 3
A 1
S 1 S 2 S 3
D
EE141 EECS141 68 Lecture #25
Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders
buffers progressive sizing combination of tree and pass transistor approaches
Solutions:
BL 0 BL 1 BL 2 BL 3
D
A 0
A 0
A 1
A 1
EE141
35
EE141 EECS141 69 Lecture #25
t p C Δ V ⋅
I av ---------------- =
make Δ V as small as possible
small large
Idea: Use Sense Amplifer
EE141 EECS141 70 Lecture #25
Directly applicable to SRAMs
M 4
M 1
M 5
M 3
M 2
V DD
bit bit
SE
Out y
EE141
36
EE141 EECS141 71 Lecture #25
EE141 EECS141 72 Lecture #25
Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.
EQ
V DD BL BL
SE
SE