content addressable memories cell design and peripheral circuits

30
Content Addressable Memories Cell Design and Peripheral Circuits

Upload: chrystal-hall

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Content Addressable Memories Cell Design and Peripheral Circuits

Content Addressable Memories

Cell Design and Peripheral Circuits

Page 2: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction CAM vs. RAM

001101115

100011014

101111013

110010112

000011011

010101010

10001101

Data Out

44

Ad

dre

ss In

110001115

000111014

100011013

110010112

000011011

010101010

10001101

Data In

33

Ad

dre

ss O

ut

1000110110001101

Page 3: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction Binary CAM Cell

ML pre-charged to VDD

Match: ML remains at VDD

Mismatch: ML discharges

BL1cBL1

WL

SL1c SL1

ML

BL1c_cellBL1_cell

P1 P2

N1 N2

N3N4

N5 N7

N6 N8

Page 4: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction Ternary CAM (TCAM)

00X001115

010011014

000111013

110010X12

101011011

010X01010

XXX01101

Input Keyword

XXXXX1115

XXXX11014

XXX111013

XX0010112

X00011011

010101010

01101

01101

1101

0001101

11

44

Match

Match

11

44

Match

Match

10001101

Input Keyword

Page 5: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction TCAM Cell

Global Masking SLs Local Masking BLs

BL1 BL2 Logic

0 1 0

1 0 1

1 1 X

0 0 N.A.

BL1BL1 BL2BL2

WLWL

RAM RAM CellCell

RAM RAM CellCell

SL1SL1 SL2SL2MLML

BL1cBL1c BL2cBL2c

Comparison Comparison LogicLogic

Page 6: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction DRAM based TCAM Cell

Higher bit density Slower table update Expensive process Refreshing circuitry Scaling issues (Leakage)

BL2BL1

WL

SL2 SL1

ML

BL2_cellBL1_cell

N3 N4

N5 N7

N6 N8

Page 7: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction SRAM based TCAM Cell

Standard CMOS process Fast table update Large area (16T)

BL1 BL1c BL2BL2c

WL

SL1 SL2

ML

BL1c_cell BL2c_cell

Page 8: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction Block diagram of a 256 x 144 TCAM

CAM Cell (0)

BL1c(0) BL2c(0)

CAM Cell (143)

BL1c(N) BL2c(N)

CAM Cell (0)

BL1c(0) BL2c(0)

CAM Cell (143)

BL1c(N) BL2c(N)

ML0SL1(143) SL2(143) SL1(0) SL2(0)

MLSAMLSO(0)

MLSAML255 MLSO(255)

SL Drivers

Search Lines (SLs)

ML Sense Amplifiers

Match Lines

(MLs)

Page 9: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction Why low-power TCAMs?

Parallel search Very high power(2Mb Sibercore TCAM 66MHz 66Msps 3.4W)

IPv6, OC-768 Larger word size, larger no. of entries High power

Embedded applications (SoC)

Page 10: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Introduction

Why high-performance TCAMs? OC-768 135M packets/s (7.4 ns/packet)

Application complexity Multiple searches

IPv6 Larger word size larger search time

Page 11: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques

Cell Design: 12T Static TCAM cell* ‘0’ is retained by Leakage (VWL ~ 200 mV)

High density Leakage (3 orders) Noise margin Soft-errors (node S) Unsuitable for READ

* I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003

Page 12: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques

Cell Design: NAND vs. NOR Type CAM Low Power Charge-sharing Slow CAM

Cell (N)CAM

Cell (1)CAM

Cell (0)

SAML_NAND M

SA

CAM Cell (N)

CAM Cell (1)

CAM Cell (0)

ML_NOR MM

BL1 BL1c

WL

SL1 SL1c

VDD BL1 BL1c

WL

SL1c SL1

VDD

NAND-type CAM NOR-type CAM

Page 13: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques MLSA Design: Conventional

Pre-charge ML to VDD

Match VML = VDD

Mismatch VML = 0

MM MM

VDD

PRE

MLSO

VDD

ML

Page 14: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques MLSA Design: Current Race Sensing*

Dummy MLDummy ML MLOFFMLOFF

DelayDelay

* I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003

RSTRST

VVDDDD

RSTcRSTc

MLML

MLSOMLSOMLOFFMLOFF

MATCHMATCHMMMM MMMM

Page 15: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques

MLSA Design: Current Race Sensing No need to reset SLs in every clock cycle Lower ML voltage swing (Vth + ∆V) ≈ ½VDD

Speed Current Voltage Margin

Voltage Margin

ML [0]

MLSO [0]

ML [1]

Page 16: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques

MLSA Design: Charge Redistribution* Fast pre-charge ML through MREF

Mismatch SP=‘0’ MLSO=‘1’ IML > IREF > leakage

∆VML (VREF – Vth)

FAST_PRE High power

* P. Vlasenko, D. Perry, MOSAID Technologies Inc., US Patent 6717876, April 6, 2004

FAST_PRE

RST

VREF

VDDVDD

SP MLSOIREF

ML

CML

CSPMREF

RST

Page 17: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques

MLSA Design: Charge Injection* Reset ML and pre-charge CINJ

Charge share CINJ and CML

Match VML = CINJ x VDD/(CINJ +CML)

Mismatch VML = 0

Small ∆VML

Poor noise margin Area penalty (CINJ)

VDD

ML MLSO

CML

OFFSET SACHARGE_INPRE

CINJ

RST

* G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda, SONY Corp., Proc. IEEE CICC, pp. 387-390, Sep. 2003

Page 18: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Low Power: Selective Pre-charge*

MLs: Two segments If MATCH in pre-search Main-search No. of bits in pre-search Data statistics

ML1 ML2MLSA1

MLSO1

MLSA2MLSO2

ML1 ML2MLSA1

MLSO1

MLSA2MLSO2

PRE-SEARCH MAIN-SEARCH

* C. Zukowski and S. Wang, Proc. IEEE ISCAS, pp. 745-770, Jun. 9-12, 1997

Page 19: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Low Power: Dual-ML TCAM*

MLSA1 is enabled first MLSA2 is enabled if MLSO1 = ‘1’

* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004

CAM Cell (0)

BL1c(0) BL2c(0)

CAM Cell (N)

BL1c(N) BL2c(N)

ML1SL1(N) SL2(N) SL1(0) SL2(0)

MLSA1MLSO1

ML2

MLSA2MLSO2

ML1

ML2

Page 20: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Low Power: Dual-ML TCAM

Cap(ML1) = Cap(ML2) = ½ C(ML) Same speed, 50% less energy (Ideally!)

Parasitic interconnects degrade both speed and energy

Additional ML increases coupling capacitance

Page 21: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques

Low Power: Dual-ML TCAM Simulation results (144 bits)*

Interconnect cap. = 27 fF W/L = 0.6µm/0.18µm

Old New Difference

TS (ns) 8.14 8.46 4%

E1 (fJ) 769 426 45%

E2 (fJ) 769 973 26%

* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004

Page 22: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Low Power: Dual-ML TCAM*

EAVG = PML1 x E1 +(1 – PML1) x E2

SA1 cannot detect Type I For ‘M’ mismatches, PML1 = 1 – (0.5)M

Mismatch SL1 SL2 BL1 BL2

Type I 0 1 1 0

Type II 1 0 0 1

SL1SL1

BL1cBL1c

ML1ML1

* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004

Page 23: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Low Power: Dual-ML TCAM*

* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004

0

1

2

3

4

5

6

1 2 3 4 5 6

Number of Mismatches (M)

Ave

rag

e E

ner

gy

(fJ/

bit

/sea

rch

)

TraditionalDual ML

43%

Page 24: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Low Power: Hierarchical SLs*

144 bits (5 segments: 8, 34, 34, 34, 34) SLs Multiple blocks (64 words each) ∆VGSL 0.45V (VDD=1.8V)

Logic complexity Search time/latency 64-bit OR gates

* Pagiamtzis et. al., Proc. IEEE CICC, pp. 383-386, Sep. 2003

Page 25: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Static Power Reduction

16T TCAM: Leakage Paths*

WL

BL1 BL1c

SL1 SL2

BL2BL2c

ML

‘1’‘0’ ‘1’

‘0’

N1 N2

N3 N4

P1 P2

N5 N6

N7 N8

P3 P4N12

N9 N11

N10

‘0’ ‘0’‘1’ ‘1’

BL1c_cell BL2c_cell

* N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004

Page 26: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Static Power Reduction

Technology Scaling1

Dimensions 30% Dynamic power 50% Leakage current 5x

Architectural level techniques2, 3

A small portion is enabled

1. S. Borkar, IEEE Micro, pp. 23-29, Jul.-Aug. 1999

2. K. Pagiamtzis, A. Sheikholeslami, Proc. IEEE CICC, pp. 383-386, Sep. 2003

3. G. Kasai, Y. Takarabe, K. Furumi, M. Yoneda, Proc. IEEE CICC, pp. 387-390, Sep. 2003

Page 27: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Static Power Reduction

Leakage current* VDD ISUB

1 20 exp( )S S DD

SUBT

k k VI I

nV

VDD

* R. X. Gu, M. I. Elmasry, IEEE JSSC, vol. 31, no. 5, pp. 707-713, May 1996

Page 28: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques

Static Power Reduction Side Effects of VDD Reduction in TCAM Cells

Speed: No change Dynamic power: No change Robustness VDD Volt. Margin

(Current-race sensing)Voltage Margin

ML [0]

MLSO [0]

ML [1]

Page 29: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Static Power Reduction

Voltage Margin of 144-bit TCAM word in 0.18 µm CMOS*

200

250

300

350

400

450

500

1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8

VDD (V)

Vo

ltag

e m

arg

in (

mV

)

* N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004

Page 30: Content Addressable Memories Cell Design and Peripheral Circuits

CAM: Design Techniques Static Power Reduction

Effects of Technology Scaling* Berkeley predictive technology model (BPTM)

0.1

1

10

100

1000

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

VDD (V)

Lea

kag

e C

urr

ent

(nA

)

130 nm100 nm70 nm45 nm

* N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004