lecture25-memorybwrcs.eecs.berkeley.edu/classes/icdesign/ee141_s10/...ee141 1 eecs141ee141 lecture...

EE141

1

EE141 EECS141 1 Lecture #25


  Hw 8 Posted   Last one to be graded   Due Friday April 30

  Hw 6 and 7 Graded and available   Project Phase 2 Graded   Project Phase 3 Launch Today

EE141

2


0

1

2

3

4

5

6

1.5 2 2.5 3 3.5 4

Frequency [GHz]

Performance Histogram


0

1

2

3

4

1.5 2 2.5 3 3.5 4 Frequency [GHz]

0.0E+00

5.0E-04

1.0E-03

1.5E-03

2.0E-03

2.5E-03

3.0E-03

3.5E-03

0.0E+00 1.0E+09 2.0E+09 3.0E+09

Pow

er [W

]

Frequency [Hz]

Performance - pruned

EE141

3


  Max: 98   Avg: 78.9   StDev: 12.9   Median: 81 0

1

2

3

4

5

6

50 60 70 80 90 100


  Design implementation of 256-bit code

  Create floorplan to estimate wirelength

  Optimize circuit such that energy is minimized for max delay of 10 nsec

  Determine also Tdmin and Emin for that design

  Present results in poster

Energy

Delay

(Td,max, Emin)

(Td,min, Emax)

(Td, E)

EE141

4


Template:

Group A:

Group B: Same but period is 4

Group C: Same but period is 8

Group D: Same but period is 16


!

EE141

5


!


Energy

Delay

10ns

? Joule

Optimization target The extremes

EE141

6


 Last lecture  Multipliers

 Today’s lecture  Memory cells

 Reading (Ch 12)

EE141 EECS141 12 Lecture #25 12

EE141

7


Read-Write Memory Non-Volatile Read-Write

Memory Read-Only Memory

EPROM E 2 PROM

FLASH

Random Access

Non-Random Access

SRAM DRAM

Mask-Programmed Programmable (PROM)

FIFO

Shift Register CAM

LIFO


  STATIC (SRAM)

  DYNAMIC (DRAM)

Data stored as long as supply is applied Larger (6 transistors/cell) Fast Differential (usually)

Periodic refresh required Smaller (1-3 transistors/cell) Slower Single Ended

EE141

8


  Conceptual: linear array   Each box holds some data   But this does not lead to a nice layout shape   Too long and skinny

  Create a 2-D array   Decode Row and Column

address to get data


Word 0 Word 1 Word 2

Word N-2 Word N-1

Storage cell

M bits M bits

N words

S 0 S 1 S 2

S N-2

A 0 A 1

A K-1

K = log 2 N S N -1

Word 0 Word 1 Word 2

Word N-2 Word N-1

Storage cell

S 0

Input-Output ( M bits)

Intuitive architecture for N x M memory Too many select signals:

N words == N select signals K = log 2 N Decoder reduces the number of select signals

Input-Output ( M bits)

Decoder

EE141

9



•  If D is high, D_b will be driven low •  Which makes D stay high

•  Positive feedback

EE141

10



Access transistor must be able to overpower the feedback

EE141

11



Complementary data values are written (read) from two sides

EE141

12

EE141 EECS141 23 Lecture #25 EE141

WL

BL

V DD

M 5 M 6

M 4

M 1

M 2

M 3

BL

Q Q


0 1

•  Q_b will get pulled up when WL first goes high •  Reading the cell should not destroy the stored value

Read

EE141

13


WL

BL V DD

M 5 M 6

M 4

M 1 V DD V DD V DD

BL Q = 1 Q = 0

C bit C bit


0 0

0.2 0.4 0.6 0.8

1 1.2

0.5 1 1.2 1.5 2 Cell Ratio (CR)

2.5 3

Volta

ge R

ise

(V)

EE141

14


BL = 1 BL = 0

Q = 0 Q = 1

M 1

M 4

M 5 M 6

V DD

V DD WL


EE141

15


SNM

Obtained by breaking the feedback between the inverters


V DD

GND

WL

BL BLB

Compact cell Bitlines: M2 Wordline: bootstrapped in M3

EE141

16


 ST/Philips/Motorola

Access Transistor

Pull down Pull up


EE141

17



Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem

M 3

R L R L V DD

WL

Q Q

M 1 M 2

M 4

BL BL

EE141

18


No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL -V Tn

WWL BL 1

M 1 X M 3

M 2 C S

BL 2

RWL

V DD V DD - V T

D V V DD - V T BL 2 BL 1 X RWL WWL


BL2 BL1 GND

RWL

WWL

M3

M2

M1

WWL BL 1

M 1 X M 3

M 2 C S

BL 2

RWL

EE141

19


Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance

Voltage swing is small; typically around 250 mV.

Δ V BL V PRE – V BIT V PRE – C S C S C BL + ------------ = = V


Uses Polysilicon-Diffusion Capacitance Expensive in Area

M 1 word line

Diffused bit line

Polysilicon gate

Polysilicon plate

Capacitor

Cross-section Layout

Metal word line

Poly

SiO 2

Field Oxide n + n + Inversion layer induced by plate bias

Poly

EE141

20



Cell Plate Si

Capacitor Insulator

Storage Node Poly

2nd Field Oxide

Refilling Poly

Si Substrate

Trench Cell Stacked-capacitor Cell

Capacitor dielectric layer Cell plate Word line

Insulating Layer

Isolation Transfer gate Storage electrode

EE141

21


WL BL

WL

BL

1 WL

BL

WL BL

WL BL

0

VDD

WL

BL

GND

Diode ROM MOS ROM 1 MOS ROM 2


WL [0] GND

BL [0]

WL [1]

WL [2]

WL [3]

V DD

BL [1]

Pull-up devices

BL [2] BL [3]

GND

EE141

22


Programmming using the Active Layer Only

Polysilicon

Metal1

Diffusion

Metal1 on Diffusion

Cell (9.5λ x 7λ)


Polysilicon

Metal1

Diffusion

Metal1 on Diffusion

Cell (11λ x 7λ)

Programmming using the Contact Layer Only

EE141

23


All word lines high by default with exception of selected row

WL [0]

WL [1]

WL [2]

WL [3]

V DD Pull-up devices

BL [3] BL [2] BL [1] BL [0]


No contact to VDD or GND necessary;

Loss in performance compared to NOR ROM drastically reduced cell size

Polysilicon

Diffusion

Metal1 on Diffusion

Cell (8λ x 7λ)

Programmming using the Metal-1 Layer Only

EE141

24


Cell (5λ x 6λ)

Polysilicon

Threshold-altering implant

Metal1 on Diffusion

Programmming using Implants Only


Floating gate Source

Substrate

Gate Drain

n + n +_ p

t ox t ox

Device cross-section Schematic symbol

G

S

D

EE141

25


0 V

- 5 V 0 V

D S

Removing programming voltage leaves charge trapped

5 V

- 2.5 V 5 V

D S

Programming results in higher V T .

20 V

10 V 5 V 20 V

D S

Avalanche injection


Floating gate Source

Substrate p

Gate Drain

n 1 n 1

FLOTOX transistor Fowler-Nordheim I-V characteristic

20 – 30 nm

10 nm

-10 V 10 V

I

V GD

EE141

26


WL

BL

V DD

Absolute threshold control is hard Unprogrammed transistor might be depletion 2 transistor cell


EPROM Flash Courtesy Intel

EE141

27


  Decoders   Sense Amplifiers   Input/Output Buffers   Control / Timing Circuitry


Collection of 2M complex logic gates Organized in regular and dense fashion

(N)AND Decoder

NOR Decoder

EE141

28


 Look at decoder for 256x256 memory block (8KBytes)


  Goal: Build fastest possible decoder with static CMOS logic

  What we know   Basically need 256 AND

gates, each one of them drives one word line

N=8

EE141

29


  Each word line has 256 cells connected to it   Total output load is 256*Ccell + Cwire

  Assume that decoder input capacitance is Caddress=4*Ccell

  Each address drives 28/2 AND gates   A0 drives ½ of the gates, A0_b the other ½ of the

gates   Neglecting Cwire, the fan-out on each one of the

16 address wires is: B


  FB of at least 213 means that we will want to use more than log4(213) = 6.5 stages to implement the AND8

  Need many stages anyways   So what is the best way to implement the AND

gate?   Will see next that it’s the one with the most stages

and least complicated gates

EE141

30


LE=10/3 1 ΠLE = 10/3 P = 8 + 1

LE=2 5/3 ΠLE = 10/3 P = 4 + 2

LE=4/3 5/3 4/3 1 ΠLE = 80/27 P = 2 + 2 + 2 + 1


  Using 2-input NAND gates   8-input gate takes 6 stages

  Total LE is (4/3)3 ≈ 2.4   So PE is 2.4*213 – optimal N of ~7.1

EE141

31


 256 8-input AND gates   Each built out of

tree of NAND gates and inverters

 Issue:   Every address line has

to drive 128 gates (and wire) right away

 Can’t build gates small enough - Forces us to add buffers just to drive address inputs


EE141

32


 Use a single gate for each of the shared terms   E.g., from A0, A0, A1, and A1, generate four

signals: A0A1, A0A1, A0A1, A0A1

 In other words, we are decoding smaller groups of address bits first   And using the “predecoded” outputs to do

the rest of the decoding


EE141

33


• • •

• • •

A 2 A 2

A 2 A 3

WL 0

A 2 A 3 A 2 A 3 A 2 A 3

A 3 A 3 A 0 A 0

A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1

A 1 A 1

WL 1

Multi-stage implementation improves performance


Precharge devices

V DD φ

GND

WL 3 WL 2 WL 1 WL 0

A 0 A 0

GND

A 1 A 1 φ

WL 3

A 0 A 0 A 1 A 1

WL 2

WL 1

WL 0

V DD

V DD

V DD

V DD

2-input NOR decoder 2-input NAND decoder

EE141

34


Advantages: speed (tpd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count

2-input NOR decoder

A 0 S 0

BL 0 BL 1 BL 2 BL 3

A 1

S 1 S 2 S 3

D


Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders

buffers progressive sizing combination of tree and pass transistor approaches

Solutions:

BL 0 BL 1 BL 2 BL 3

D

A 0

A 0

A 1

A 1

EE141

35


t p C Δ V ⋅

I av ---------------- =

make Δ V as small as possible

small large

Idea: Use Sense Amplifer


Directly applicable to SRAMs

M 4

M 1

M 5

M 3

M 2

V DD

bit bit

SE

Out y

EE141

36



Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.

EQ

V DD BL BL

SE

SE

lecture25-memorybwrcs.eecs.berkeley.edu/classes/icdesign/ee141_s10/...ee141 1 eecs141ee141 lecture...

Documents