new homework to be posted this weekend project phase 3
TRANSCRIPT
EE141
1
EE141 EECS141 1 Lecture #24
EE141 EECS141 2 Lecture #24
New homework to be posted this weekend Last one to be graded
Project Phase 3 Launched Next We Some insights today
EE141
2
EE141 EECS141 3 Lecture #24
Last lecture Adders
Today’s lecture Multipliers Introduction to Memory
Reading (Ch 11)
EE141 EECS141 4 Lecture #24
EE141
3
EE141 EECS141 5 Lecture #24
EE141 EECS141 6 Lecture #24
EE141
4
EE141 EECS141 7 Lecture #24
EE141 EECS141 8 Lecture #24
Critical Path 1 & 2
EE141
5
EE141 EECS141 9 Lecture #24
Balanced tsum and tcarry
EE141 EECS141 10 Lecture #24
EE141
6
EE141 EECS141 11 Lecture #24
EE141 EECS141 12 Lecture #24
EE141
7
EE141 EECS141 13 Lecture #24
EE141 EECS141 14 Lecture #24
EE141
8
EE141 EECS141 15 Lecture #24
Optimization constraints different than in binary adder Once again:
– Need to identify critical path – And find ways to use parallelism to reduce it
Other possible techniques Logarithmic versus linear (Wallace Tree Mult) Data encoding (Booth) Pipelining
First glimpse at system level optimization
EE141 EECS141 16 Lecture #24
EE141
9
EE141 EECS141 17 Lecture #24
Area Dominated by Wiring
EE141 EECS141 18 Lecture #24
Widthbarrel ~ 2 pm M
EE141
10
EE141 EECS141 19 Lecture #24
EE141 EECS141 20 Lecture #24
A 3
A 2
A 1
A 0
Out3
Out2
Out1
Out0
EE141
11
EE141 EECS141 21 Lecture #24
EE141 EECS141 22 Lecture #24 22
EE141
12
EE141 EECS141 23 Lecture #24
Intel 45nm Core 2
EE141 EECS141 24 Lecture #24
Read-Write Memory Non-Volatile Read-Write
Memory Read-Only Memory
EPROM E 2 PROM
FLASH
Random Access
Non-Random Access
SRAM DRAM
Mask-Programmed Programmable (PROM)
FIFO
Shift Register CAM
LIFO
EE141
13
EE141 EECS141 25 Lecture #24 25
STATIC (SRAM)
DYNAMIC (DRAM)
Data stored as long as supply is applied Larger (6 transistors/cell) Fast Differential (usually)
Periodic refresh required Smaller (1-3 transistors/cell) Slower Single Ended
EE141 EECS141 26 Lecture #24
Conceptual: linear array Each box holds some data But this does not lead to a nice layout shape Too long and skinny
Create a 2-D array Decode Row and Column
address to get data
EE141
14
EE141 EECS141 27 Lecture #24
Word 0 Word 1 Word 2
Word N-2 Word N-1
Storage cell
M bits M bits
N words
S 0 S 1 S 2
S N-2
A 0 A 1
A K-1
K = log 2 N S N -1
Word 0 Word 1 Word 2
Word N-2 Word N-1
Storage cell
S 0
Input-Output ( M bits)
Intuitive architecture for N x M memory Too many select signals:
N words == N select signals K = log 2 N Decoder reduces the number of select signals
Input-Output ( M bits)
Decoder
EE141 EECS141 28 Lecture #24
EE141
15
EE141 EECS141 29 Lecture #24
Collection of 2M complex logic gates Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder
EE141 EECS141 30 Lecture #24
Look at decoder for 256x256 memory block (8KBytes)
EE141
16
EE141 EECS141 31 Lecture #24
Goal: Build fastest possible decoder with static CMOS logic
What we know Basically need 256 AND
gates, each one of them drives one word line
N=8
EE141 EECS141 32 Lecture #24
Each word line has 256 cells connected to it Total output load is 256*Ccell + Cwire
Assume that decoder input capacitance is Caddress=4*Ccell
Each address drives 28/2 AND gates A0 drives ½ of the gates, A0_b the other ½ of the
gates Neglecting Cwire, the fan-out on each one of the
16 address wires is: B
EE141
17
EE141 EECS141 33 Lecture #24
FB of at least 213 means that we will want to use more than log4(213) = 6.5 stages to implement the AND8
Need many stages anyways So what is the best way to implement the AND
gate? Will see next that it’s the one with the most stages
and least complicated gates
EE141 EECS141 34 Lecture #24
LE=10/3 1 ΠLE = 10/3 P = 8 + 1
LE=2 5/3 ΠLE = 10/3 P = 4 + 2
LE=4/3 5/3 4/3 1 ΠLE = 80/27 P = 2 + 2 + 2 + 1
EE141
18
EE141 EECS141 35 Lecture #24
Using 2-input NAND gates 8-input gate takes 6 stages
Total LE is (4/3)3 ≈ 2.4 So PE is 2.4*213 – optimal N of ~7.1
EE141 EECS141 36 Lecture #24
256 8-input AND gates Each built out of
tree of NAND gates and inverters
Issue: Every address line has
to drive 128 gates (and wire) right away
Can’t build gates small enough - Forces us to add buffers just to drive address inputs
EE141
19
EE141 EECS141 37 Lecture #24
EE141 EECS141 38 Lecture #24
Use a single gate for each of the shared terms E.g., from A0, A0, A1, and A1, generate four
signals: A0A1, A0A1, A0A1, A0A1
In other words, we are decoding smaller groups of address bits first And using the “predecoded” outputs to do
the rest of the decoding
EE141
20
EE141 EECS141 39 Lecture #24