caltech cs184 winter2003 -- dehon 1 cs184a: computer architecture (structure and organization) day...

53
Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Upload: camilla-tyler

Post on 18-Jan-2018

235 views

Category:

Documents


0 download

DESCRIPTION

Caltech CS184 Winter DeHon 3 Today Cascades ALUs PLAs

TRANSCRIPT

Page 1: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon1

CS184a:Computer Architecture

(Structure and Organization)

Day 10: January 31, 2003Compute 2:

Page 2: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon2

Last Time

• LUTs– area– structure– big LUTs vs. small LUTs with interconnect– design space– optimization

Page 3: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon3

Today

• Cascades• ALUs• PLAs

Page 4: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon4

Last Time

• Larger LUTs– Less interconnect delay

• General: Larger compute blocks– Minimize interconnect

• Large LUTs– Not efficient for typical logic structure

Page 5: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon5

Different Structure

• How can we have “larger” compute nodes (less general interconnect) without paying huge area penalty of large LUTs?

Page 6: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon6

Structure in subgraphs

• Small LUTs capture structure

• What structure does a small-LUT-mapped netlist have?

Page 7: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon7

Structure

• LUT sequences ubiquitous

Page 8: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon8

Hardwired Logic Blocks

Single Output

Page 9: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon9

Hardwired Logic Blocks

Two outputs

Page 10: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon10

Delay Model

• Tcascade =T(3LUT) + T(mux)

Page 11: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon11

Options

Page 12: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon12

Chung & Rose Study

[Chung & Rose, DAC ’92]

Page 13: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon13

Cascade LUT Mappings

[Chung & Rose, DAC ’92]

Page 14: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon14

ALU vs. Cascaded LUT?

Page 15: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon15

4-LUT Cascade ALU

Page 16: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon16

ALU vs. LUT ?

• Compare/contrast• ALU

– Only subset of ops available– Denser coding for those ops– Smaller– …but interconnect dominates

Page 17: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon17

Parallel Prefix LUT Cascade?

• Can compute LUT cascade in O(log(N)) time?

• Can compute mux cascade using parallel prefix?

• Can make mux cascade associative?

Page 18: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon18

Parallel Prefix Mux cascade

• How can mux transform Smux-out?– A=0, B=0 mux-out=0– A=1, B=1 mux-out=1– A=0, B=1 mux-out=S– A=1, B=0 mux-out=/S

Page 19: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon19

Parallel Prefix Mux cascade

• How can mux transform Smux-out?– A=0, B=0 mux-out=0 Stop= S– A=1, B=1 mux-out=1 Generate= G– A=0, B=1 mux-out=S Buffer = B– A=1, B=0 mux-out=/S Invert = I

Page 20: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon20

Parallel Prefix Mux cascade

• How can 2 muxes transform input?• Can I compute transform from 1 mux

transforms?

Page 21: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon21

Two-mux transforms

• SSS• SGG• SBS• SIG

• GSS• GGG• GBG• GIS

• BSS• BGG• BBB• BII

• ISS• IGG• IBI• IIB

Page 22: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon22

Generalizing mux-cascade

• How can N muxes transform the input?• Is mux transform composition

associative?

Page 23: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon23

Parallel Prefix Mux-cascade

Page 24: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon24

Commercial Devices

Page 25: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon25

Xilinx XC4000 CLB

Page 26: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon26

Xilinx Virtex-II

Page 27: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon27

Page 28: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon28

Page 29: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon29

Page 30: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon30

Altera Stratix

Page 31: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon31

Page 32: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon32

Page 33: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon33

Programmable Array Logic(PLAs)

Page 34: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon34

PLA

• Directly implement flat (two-level) logic– O=a*b*c*d + !a*b*!d + b*!c*d

• Exploit substrate properties allow wired-OR

Page 35: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon35

Wired-or

• Connect series of inputs to wire• Any of the inputs can drive the wire high

Page 36: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon36

Wired-or

• Implementation with Transistors

Page 37: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon37

Programmable Wired-or

• Use some memory function to programmable connect (disconnect) wires to OR

• Fuse:

Page 38: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon38

Programmable Wired-or

• Gate-memory model

Page 39: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon39

Diagram Wired-or

Page 40: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon40

Wired-or array

• Build into array– Compute many different or functions from

set of inputs

Page 41: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon41

Combined or-arrays to PLA

• Combine two or (nor) arrays to produce PLA (and-or array)

ProgrammableLogicArray

Page 42: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon42

PLA

• Can implement each and on single line in first array

• Can implement each or on single line in second array

Page 43: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon43

PLA

• Efficiency questions:– Each and/or is linear in total number of

potential inputs (not actual)– How many product terms between arrays?

Page 44: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon44

PLAs

• Fast Implementations for large ANDs or Ors• Number of P-terms can be exponential in

number of input bits – most complicated functions– not exponential for many functions

• Can use arrays of small PLAs– to exploit structure– like we saw arrays of small memories last time

Page 45: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon45

PLA Product Terms

• Can be exponential in number of inputs• E.g. n-input xor (parity function)

– When flatten to two-level logic, requires exponential product terms

– a*!b+!a*b– a*!b*!c+!a*b*!c+!a*!b*c+a*b*c

• …and shows up in important functions– Like addition…

Page 46: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon46

PLAs vs. LUTs?• Look at Inputs, Outputs, P-Terms

– minimum area (one study, see paper)– K=10, N=12, M=3

• A(PLA 10,12,3) comparable to 4-LUT?– 80-130%?– 300% on ECC (structure LUT can exploit)

• Delay? – Claim 40% fewer logic levels (4-LUT)

• (general interconnect crossings)

[Kouloheris & El Gamal/CICC’92]

Page 47: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon47

PLA

Page 48: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon48

PLA and Memory

Page 49: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon49

PLA and PAL

PAL = Programmable Array Logic

Page 50: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon50

Conventional/Commercial FPGA

Altera 9K (from databook)

Page 51: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon51

Conventional/Commercial FPGA

Altera 9K (from databook)

Page 52: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon52

Big Ideas[MSB Ideas]

• Programmable Interconnect allows us to exploit that structure– want to match to application structure– Prog. interconnect delay expensive

• Hardwired Cascades– key technique to reducing delay in programmables

• PLAs– canonical two level structure– hardwire portions to get Memories, PALs

Page 53: Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Caltech CS184 Winter2003 -- DeHon53

Big Ideas[MSB-1 Ideas]

• Better structure match with hardwired LUT cascades