Introduction to Computer Organization and Architecture
Lecture 7By Juthawut
Chantharamaleehttp://dusithost.dusit.ac.th/~juthawut_cha/home.htm
Outline Memory Arrays and Hierarchy SRAM Architecture
SRAM Cell Decoders Column Circuitry Multiple Ports
Serial Access Memories Flash DRAM
2Introduction to Computer Organization and Architecture
Memory ArraysMemory Arrays
Random Access Memory Serial Access Memory Content Addressable Memory(CAM)
Read/Write Memory(RAM)
(Volatile)
Read Only Memory(ROM)
(Nonvolatile)
Static RAM(SRAM)
Dynamic RAM(DRAM)
Shift Registers Queues
First InFirst Out(FIFO)
Last InFirst Out(LIFO)
Serial InParallel Out
(SIPO)
Parallel InSerial Out
(PISO)
Mask ROM ProgrammableROM
(PROM)
ErasableProgrammable
ROM(EPROM)
ElectricallyErasable
ProgrammableROM
(EEPROM)
Flash ROM
3Introduction to Computer Organization and Architecture
Levels of the Memory Hierarchy
Introduction to Computer Organization and Architecture 4
Part of The On-chip CPU Datapath ISA 16-128 Registers
One or more levels (Static RAM):Level 1: On-chip 16-64K Level 2: On-chip 256K-2MLevel 3: On or Off-chip 1M-16M
Registers
CacheLevel(s)
Main Memory
Magnetic Disc
Optical Disk or Magnetic Tape
Farther away from the CPU:
Lower Cost/Bit
Higher Capacity
Increased AccessTime/Latency
Lower Throughput/Bandwidth
Dynamic RAM (DRAM) 256M-16G
Interface:SCSI, RAID, IDE, 139480G-300G
CPU
Memory Hierarchy Comparisons
Introduction to Computer Organization and Architecture 5
CPU Registers100s Bytes<10s ns
CacheK Bytes10-100 ns1-0.1 cents/bit
Main MemoryM Bytes200ns- 500ns$.0001-.00001 cents /bitDiskG Bytes, 10 ms (10,000,000 ns)
10 - 10 cents/bit-5 -6
CapacityAccess TimeCost
Tapeinfinitesec-min10 -8
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
StagingXfer Unit
prog./compiler1-8 bytes
cache cntl8-128 bytes
OS4K-16K bytes
user/operatorMbytes
faster
Larger
Connecting Memory
Introduction to Computer Organization and Architecture 6
Up to 2 k addressableMDR
MAR
k-bitaddress bus
n-bitdata bus
Control lines
( , MFC, etc.)
Processor Memory
locations
Word length = n bits
WR /
Array Architecture 2n words of 2m bits each If n >> m, fold by 2k into fewer rows of more columns
Good regularity – easy to design Very high density if good cells are used
row decoder
columndecoder
n
n-kk
2m bits
columncircuitry
bitline conditioning
memory cells:2n-k rows x2m+k columns
bitlines
wordlines
7Introduction to Computer Organization and Architecture
6T SRAM Cell Cell size accounts for most of array size
Reduce cell size at expense of complexity 6T SRAM Cell
Used in most commercial chips Data stored in cross-coupled inverters
Read: Precharge bit, bit_b Raise wordline
Write: Drive data onto bit, bit_b Raise wordline
bit bit_b
word
8Introduction to Computer Organization and Architecture
SRAM Read Precharge both bitlines high Then turn on wordline One of the two bitlines will be pulled down by the cell Ex: A = 0, A_b = 1
bit discharges, bit_b stays high
bit bit_b
N1
N2P1
A
P2
N3
N4
A_b
word
9Introduction to Computer Organization and Architecture
SRAM Write Drive one bitline high, the other low Then turn on wordline Bitlines overpower cell with new value Ex: A = 0, A_b = 1, bit = 1, bit_b = 0
Force A_b low
bit bit_b
N1
N2P1
A
P2
N3
N4
A_b
word
10Introduction to Computer Organization and Architecture
SRAM Column ExampleRead Write
H H
SRAM Cell
word_q1
bit_v1f
bit_b_v1f
out_v1rout_b_v1r
1
2
word_q1
bit_v1f
out_v1r
2
MoreCells
Bitline Conditioning
2
MoreCells
SRAM Cell
word_q1
bit_v1f
bit_b_v1f
data_s1
write_q1
Bitline Conditioning
11Introduction to Computer Organization and Architecture
Decoders n:2n decoder consists of 2n n-input AND gates
One needed for each row of memory Build AND from NAND or NOR gates
word0
word1
word2
word3
A0A1
12Introduction to Computer Organization and Architecture
Large Decoders For n > 4, NAND gates become slow
Break large gates into multiple smaller gates
word0
word1
word2
word3
word15
A0A1A2A3
13Introduction to Computer Organization and Architecture
Column Circuitry Some circuitry is required for each column
Bitline conditioning Sense amplifiers Column multiplexing
14Introduction to Computer Organization and Architecture
Bitline Conditioning Precharge bitlines high before reads
Equalize bitlines to minimize voltage difference when using sense amplifiers
bit bit_b
bit bit_b
15Introduction to Computer Organization and Architecture
Differential Pair Amp Differential pair requires no clock But always dissipates static power
bit bit_bsense_b sense
N1 N2
N3
P1 P2
16Introduction to Computer Organization and Architecture
Column Multiplexing Recall that array may be folded for good aspect ratio Ex: 2 kword x 16 folded into 256 rows x 128 columns
Must select 16 output bits from the 128 columns Requires 16 8:1 column multiplexers
17Introduction to Computer Organization and Architecture
Multiple Ports We have considered single-ported SRAM
One read or one write on each cycle
Multiported SRAM are needed for register files Examples:
Multicycle MIPS must read two sources or write a result on some cycles
Pipelined MIPS must read two sources and write a third result each cycle
Superscalar MIPS must read and write many sources and results each cycle
18Introduction to Computer Organization and Architecture
Dual-Ported SRAM Simple dual-ported SRAM
Two independent single-ended reads Or one differential write
Do two reads and one write by time multiplexing Read during ph1, write during ph2
bit bit_b
wordBwordA
19Introduction to Computer Organization and Architecture
Multi-Ported SRAM Adding more access transistors hurts read stability Multiported SRAM isolates reads from state node Single-ended design minimizes number of bitlines
bA
wordBwordA
wordDwordC
wordFwordE
wordG
bB bC
writecircuits
readcircuits
bD bE bF bG
20Introduction to Computer Organization and Architecture
Serial Access Memories Serial access memories do not use an address
Shift Registers Tapped Delay Lines Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) Queues (FIFO, LIFO)
21Introduction to Computer Organization and Architecture
Shift Register Shift registers store and delay data Simple design: cascade of registers
Watch your hold times!
clk
Din Dout8
22Introduction to Computer Organization and Architecture
Denser Shift Registers Flip-flops aren’t very area-efficient For large shift registers, keep data in SRAM instead Move read/write pointers to RAM rather than data
Initialize read address to first entry, write to last Increment address on each cycle
Din
Dout
clk
counter counter
reset
00...00
11...11
readaddr
writeaddr
dual-portedSRAM
23Introduction to Computer Organization and Architecture
Tapped Delay Line A tapped delay line is a shift register with a
programmable number of stages Set number of stages with delay controls to mux
Ex: 0 – 63 stages of delay
SR
32
clk
Din
delay5
SR
16
delay4S
R8
delay3
SR
4
delay2
SR
2
delay1
SR
1
delay0
Dout
24Introduction to Computer Organization and Architecture
Serial In Parallel Out 1-bit shift register reads in serial data
After N steps, presents N-bit parallel output
clk
P0 P1 P2 P3
Sin
25Introduction to Computer Organization and Architecture
Parallel In Serial Out Load all N bits in parallel when shift = 0
Then shift one bit out per cycle
clkshift/load
P0 P1 P2 P3
Sout
26Introduction to Computer Organization and Architecture
Queues Queues allow data to be read and written at
different rates. Read and write each use their own clock, data Queue indicates whether it is full or empty Build with SRAM and read/write counters
(pointers)
Queue
WriteClk
WriteData
FULL
ReadClk
ReadData
EMPTY
27Introduction to Computer Organization and Architecture
FIFO, LIFO Queues First In First Out (FIFO)
Initialize read and write pointers to first element Queue is EMPTY On write, increment write pointer If write almost catches read, Queue is FULL On read, increment read pointer
Last In First Out (LIFO) Also called a stack Use a single stack pointer for read and write
28Introduction to Computer Organization and Architecture
Memory Timing: Approaches
DRAM TimingMultiplexed Adressing
SRAM TimingSelf-timed
Addressbus
RAS
RAS-CAS timing
Row Address
AddressBus
Address transitioninitiates memory operation
Address
Column Address
CAS
29Introduction to Computer Organization and Architecture
Non-Volatile Memories Floating-gate transistor
Floating gate
Source
Substrate
Gate
Drain
n+ n+_p
tox
tox
Device cross-section Schematic symbol
G
S
D
30Introduction to Computer Organization and Architecture
NOR Flash Operations ―Erase
S D
12 VG
cell arrayBL0 BL1
open open
WL0
WL1
0 V
0 V
31Introduction to Computer Organization and Architecture
S D
12 V
6 VG
BL0 BL1
6 V 0 V
WL0
WL1
12 V
0 V
NOR Flash Operations ―Program
32Introduction to Computer Organization and Architecture
5 V
1 VG
S D
BL0 BL1
1 V 0 V
WL0
WL1
5 V
0 V
NOR Flash Operations ―Read
33Introduction to Computer Organization and Architecture
NAND Flash Memory
Unit Cell
Word line(poly)
Source line(Diff. Layer)
Courtesy Toshiba34
55:035 Computer Architecture and Organization
Read-Write Memories (RAM) Static (SRAM)
Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential
Dynamic (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended
35Introduction to Computer Organization and Architecture
1-Transistor DRAM Cell Write: Cs is charged or discharged by asserting WL
and BL Read: Charge redistribution takes place between bit
line and storage capacitance Voltage swing is small; typically around 250 mV
36Introduction to Computer Organization and Architecture
DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to
charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM
cells. The read-out of the 1T DRAM cell is destructive; read and
refresh operations are necessary for correct operation. 1T cell requires presence of an extra capacitance that must
be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is
lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD
37Introduction to Computer Organization and Architecture
Sense Amp Operation
ΔV(1)
V (1)
V(0)
t
VPRE
VBL
Sense amp activatedWord line activated
38Introduction to Computer Organization and Architecture
DRAM Timing
39Introduction to Computer Organization and Architecture
The End Lecture 7