isscc 2003 / session 17 / sram and dram / paper 17ali/papers/isscc2003.pdf · 17 • 2003 ieee...

10
ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17.3 17.3 A Current-Saving Match-Line Sensing Scheme for Content-Addressable Memories Igor Arsovski, Ali Sheikholeslami University of Toronto, Toronto, Canada A Content-Addressable Memory (CAM) searches for data by its content and returns the address of the matching data. This fea- ture is used extensively in applications such as internet routers to channel incoming packets towards their destination address- es contained in the packet header. Energy per search and search speed are two important metrics used to evaluate CAM perfor- mance [1]. A current-saving match-line (ML) sensing scheme is proposed that substantially reduces the energy per search with- out compromising search speed. The proposed sensing scheme consumes only 1.3fJ/bit/search, a 60% power reduction compared to previously reported sensing schemes [2,3], to achieve a search time of less than 2ns in a 256x144b array implemented in a 0.13µm CMOS process. Figure 17.3.1 shows a general CAM architecture where search- lines (SL) run perpendicular to MLs. The search data is presented to the SLs which are connected bit-by-bit to all the words stored in the memory. A NAND-based ML architecture is well known [4] for its low power consumption (due to switching of only one ML) and relatively long search time (due to having several transistors in series). In contrast, a NOR-based CAM provides a much faster search by pulling down a precharged-high ML through parallel NMOS transistors that form the NORs. This speed however comes at the price of higher switching activities of the MLs, i.e., all MLs are charged to VDD and then to ground. Figure 17.3.2 shows the circuit details of the current-saving ML sensing scheme. To explain the circuit operations of this scheme, ignore the effects of the current-saving block and assume the VAR node is grounded [2]. Prior to a search operation, search data is applied to the SLs while all MLs are precharged to ground and all 'sn' nodes are precharged high. A CAM search begins by lowering ML_EN and allowing the PMOS transistors to provide identical currents (I ML ) to all MLs. An ML0 (ML with no mismatch) is charged faster than any ML having a one-bit mismatch (ML1) or more (MLn for an ML with n-bit miss). When the Dummy ML (DML, equivalent to an ML0) reaches the threshold voltage of the NMOS transistor (V Tn ) in the ML sense circuit, the 'sn' node is discharged, flagging ML_EN to turn off all current sources, hence preventing further power consumption in the array. The signal of the DML is slightly delayed, in fact, to let ML0 go above the threshold voltage (V Tn ) while all other MLs stay below V Tn . Without the current-saving block, this architec- ture consumes power uniformly-distributed across the MLs. That is, all MLs will consume the same amount of power inde- pendent of the number of mismatches in the ML. The current- saving block is added to reduce the power consumed by the MLs having one-bit miss or more. This is achieved through dynami- cally allocating less current to slower-rising MLs as next described. The VAR node is precharged to ground to guarantee all MLs ini- tially receive the same level of current. This node is then gradu- ally pulled up by a threshold current (I Th ) or pulled down by a current proportional to the ML voltage (I sink ). In case of an ML0 (or DML), I sink is initially less than I Th , but gradually rises above I Th , pulling VAR to ground and providing a maximum I ML to the corresponding ML. In case of an ML1 (or MLn with n higher than 1), I sink remains less than I Th , allowing the difference to charge VAR and cut off I ML to the corresponding ML. Figures 17.3.3 and 17.3.4 show the simulation results of the cur- rent-saving ML sensing scheme. Figure 17.3.3 compares the volt- ages developing on ML0 and ML1, both initially at ground. In less than 0.5ns, VAR0 and VAR1, corresponding to ML0 and ML1, respectively, begin to separate, helping ML0 to rise faster than ML1. In less than 2ns, the voltage difference between ML0 and ML1 reaches at least 200mV. This is far larger than any threshold-voltage mismatch of the NMOS transistors signaling a global shut-down of all current sources with ML_EN. Figure 17.3.4 illustrates the current supplied to the MLs during the search operation. The amount of current saving increases monot- onically with the number of mismatches in a ML. This current saving reaches nearly its maximum for ML7, for which the total charge delivered by the current source is 48% less than the total charge delivered to ML0. All other MLs (MLn with n > 7) save similar amounts of current as ML7. To verify the circuit operation under various process conditions, Fig. 17.3.5 shows the comparison of the voltage developed on ML0 with the voltage developed on a 'fast' ML1 (ML1f). To model an ML1f, its capacitance is reduced by 20% to 0.8C ML and the size of its corresponding PMOS transistor is increased by 20% to 1.2W ML . As seen in the figure, ML1f does rise faster than ML0 initially, but falls below ML0 by at least 160mV in less than 2.5ns, producing correct search results. This is made possible partly by increasing I Th in Fig. 17.3.2 through three programma- ble bits (provided off-chip) and partly by ensuring that under all process conditions, the product of I ML (max) and the resistance to ground of ML1 is less than the minimum V Tn . This guarantees that the voltage of an MLn with n 1 never reaches V Tn . Increasing I Th , however, decreases I ML , hence increasing the search time by 50% to 3ns. To compare the energy saved in the proposed scheme versus those of the conventional precharge-high scheme and the cur- rent-race scheme, all three methods are simulated in an array of 256 rows by 144b. Figure 17.3.6 shows that the current-save sensing scheme cuts the energy/bit/search by 60% when com- pared to the precharge-high scheme and by 40% when compared to the current-race sensing scheme. The proposed scheme is implemented along with current-race sensing scheme in an array of 256x144b for direct comparison of search speed and search energy in a 1.2V, 0.13µm CMOS process. The testchip layout, having a total area of 1.6mmx1.8mm, is shown in Fig. 17.3.7. Acknowledgments Authors thank Trevis Chandler, Kostas Pagiamtzis, and Marcus van Ierssel for insightful discussions on this work. Authors also thank Canadian Microelectronics Corporation (CMC) for testchip fabrication, and Natural Sciences and Engineering Research Council of Canada (NSERC) for funding. References: [1] I. Y. L. Hsiao, D. H. Wang and C. W. Jen, "Power Modeling and Low- Power Design of Content Addressable Memories,” IEEE ISCAS, Vol. 4, pp. 926 -929, 2001. [2] I. Arsovski, T. Chandler, and A. Sheikholeslami, "A Ternary Content- Addressable Memory (TCAM) Based on 4T Static Storage and Including a Current-Race Sensing Scheme,” to appear in the IEEE J. Solid State Circuits, Jan. 2003. [3] P. Lin and J. Kuo, "A 1-V 128-kb Four-Set-Associative CMOS Cache Memory Using Wordline-Oriented Tag Compare (WLOTC) Structure with Content-Addressable Memory (CAM) 10-Transistor Tag Cell,” IEEE J. Solid State Circuits, Vol. 36, No. 4, pp. 666-676, Apr. 2001. [4] F. Shafai, K. J. Schultz, R. Gibson, et al., "Fully Parallel 30-MHz, 2.5- Mb CAM,” IEEE J. Solid State Circuits, Vol. 33, No. 11, pp. 1690-1696, Nov. 1998. 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Upload: others

Post on 18-Jan-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17.3

17.3 A Current-Saving Match-Line Sensing Scheme for Content-Addressable Memories

Igor Arsovski, Ali Sheikholeslami

University of Toronto, Toronto, Canada

A Content-Addressable Memory (CAM) searches for data by itscontent and returns the address of the matching data. This fea-ture is used extensively in applications such as internet routersto channel incoming packets towards their destination address-es contained in the packet header. Energy per search and searchspeed are two important metrics used to evaluate CAM perfor-mance [1]. A current-saving match-line (ML) sensing scheme isproposed that substantially reduces the energy per search with-out compromising search speed. The proposed sensing schemeconsumes only 1.3fJ/bit/search, a 60% power reduction comparedto previously reported sensing schemes [2,3], to achieve a searchtime of less than 2ns in a 256x144b array implemented in a0.13µm CMOS process.

Figure 17.3.1 shows a general CAM architecture where search-lines (SL) run perpendicular to MLs. The search data is presentedto the SLs which are connected bit-by-bit to all the words stored inthe memory. A NAND-based ML architecture is well known [4] forits low power consumption (due to switching of only one ML) andrelatively long search time (due to having several transistors inseries). In contrast, a NOR-based CAM provides a much fastersearch by pulling down a precharged-high ML through parallelNMOS transistors that form the NORs. This speed however comesat the price of higher switching activities of the MLs, i.e., all MLsare charged to VDD and then to ground.

Figure 17.3.2 shows the circuit details of the current-saving MLsensing scheme. To explain the circuit operations of this scheme,ignore the effects of the current-saving block and assume theVAR node is grounded [2]. Prior to a search operation, searchdata is applied to the SLs while all MLs are precharged toground and all 'sn' nodes are precharged high. A CAM searchbegins by lowering ML_EN and allowing the PMOS transistorsto provide identical currents (IML) to all MLs. An ML0 (ML withno mismatch) is charged faster than any ML having a one-bitmismatch (ML1) or more (MLn for an ML with n-bit miss). Whenthe Dummy ML (DML, equivalent to an ML0) reaches thethreshold voltage of the NMOS transistor (VTn) in the ML sensecircuit, the 'sn' node is discharged, flagging ML_EN to turn off allcurrent sources, hence preventing further power consumption inthe array. The signal of the DML is slightly delayed, in fact, to letML0 go above the threshold voltage (VTn) while all other MLsstay below VTn. Without the current-saving block, this architec-ture consumes power uniformly-distributed across the MLs.That is, all MLs will consume the same amount of power inde-pendent of the number of mismatches in the ML. The current-saving block is added to reduce the power consumed by the MLshaving one-bit miss or more. This is achieved through dynami-cally allocating less current to slower-rising MLs as nextdescribed.

The VAR node is precharged to ground to guarantee all MLs ini-tially receive the same level of current. This node is then gradu-ally pulled up by a threshold current (ITh) or pulled down by acurrent proportional to the ML voltage (Isink). In case of an ML0(or DML), Isink is initially less than ITh, but gradually rises aboveITh, pulling VAR to ground and providing a maximum IML to thecorresponding ML. In case of an ML1 (or MLn with n higher than1), Isink remains less than ITh, allowing the difference to chargeVAR and cut off IML to the corresponding ML.

Figures 17.3.3 and 17.3.4 show the simulation results of the cur-rent-saving ML sensing scheme. Figure 17.3.3 compares the volt-ages developing on ML0 and ML1, both initially at ground. Inless than 0.5ns, VAR0 and VAR1, corresponding to ML0 andML1, respectively, begin to separate, helping ML0 to rise fasterthan ML1. In less than 2ns, the voltage difference between ML0and ML1 reaches at least 200mV. This is far larger than anythreshold-voltage mismatch of the NMOS transistors signaling aglobal shut-down of all current sources with ML_EN. Figure17.3.4 illustrates the current supplied to the MLs during thesearch operation. The amount of current saving increases monot-onically with the number of mismatches in a ML. This currentsaving reaches nearly its maximum for ML7, for which the totalcharge delivered by the current source is 48% less than the totalcharge delivered to ML0. All other MLs (MLn with n > 7) savesimilar amounts of current as ML7.

To verify the circuit operation under various process conditions,Fig. 17.3.5 shows the comparison of the voltage developed onML0 with the voltage developed on a 'fast' ML1 (ML1f). To modelan ML1f, its capacitance is reduced by 20% to 0.8CML and the sizeof its corresponding PMOS transistor is increased by 20% to1.2WML. As seen in the figure, ML1f does rise faster than ML0initially, but falls below ML0 by at least 160mV in less than2.5ns, producing correct search results. This is made possiblepartly by increasing ITh in Fig. 17.3.2 through three programma-ble bits (provided off-chip) and partly by ensuring that under allprocess conditions, the product of IML (max) and the resistance toground of ML1 is less than the minimum VTn. This guaranteesthat the voltage of an MLn with n ≥ 1 never reaches VTn.Increasing ITh, however, decreases IML, hence increasing thesearch time by 50% to 3ns.

To compare the energy saved in the proposed scheme versusthose of the conventional precharge-high scheme and the cur-rent-race scheme, all three methods are simulated in an array of256 rows by 144b. Figure 17.3.6 shows that the current-savesensing scheme cuts the energy/bit/search by 60% when com-pared to the precharge-high scheme and by 40% when comparedto the current-race sensing scheme.

The proposed scheme is implemented along with current-racesensing scheme in an array of 256x144b for direct comparison ofsearch speed and search energy in a 1.2V, 0.13µm CMOS process.The testchip layout, having a total area of 1.6mmx1.8mm, isshown in Fig. 17.3.7.

AcknowledgmentsAuthors thank Trevis Chandler, Kostas Pagiamtzis, and Marcus vanIerssel for insightful discussions on this work. Authors also thankCanadian Microelectronics Corporation (CMC) for testchip fabrication,and Natural Sciences and Engineering Research Council of Canada(NSERC) for funding.

References:[1] I. Y. L. Hsiao, D. H. Wang and C. W. Jen, "Power Modeling and Low-Power Design of Content Addressable Memories,” IEEE ISCAS, Vol. 4, pp.926 -929, 2001.[2] I. Arsovski, T. Chandler, and A. Sheikholeslami, "A Ternary Content-Addressable Memory (TCAM) Based on 4T Static Storage and Including aCurrent-Race Sensing Scheme,” to appear in the IEEE J. Solid StateCircuits, Jan. 2003.[3] P. Lin and J. Kuo, "A 1-V 128-kb Four-Set-Associative CMOS CacheMemory Using Wordline-Oriented Tag Compare (WLOTC) Structure withContent-Addressable Memory (CAM) 10-Transistor Tag Cell,” IEEE J.Solid State Circuits, Vol. 36, No. 4, pp. 666-676, Apr. 2001.[4] F. Shafai, K. J. Schultz, R. Gibson, et al., "Fully Parallel 30-MHz, 2.5-Mb CAM,” IEEE J. Solid State Circuits, Vol. 33, No. 11, pp. 1690-1696,Nov. 1998.

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Page 2: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

ISSCC 2003 / February 12, 2003 / Salon 1-6 / 9:30 AM

17

Figure 17.3.1: CAM ML architectures. Figure 17.3.2: Current-saving match-line sensing scheme.

Figure 17.3.6: NOR architecture: energy-per-search comparison.

m

Search Data

Differential Search Lines

Mat

chL

ines

m

NAND ML architecture

NOR ML architecture

d d

SLn SLn

md d

SL0 SL0

m

SL0 SL0 SLn SLn

ML

ML

d d d d

0 1 1 0

0

1

0

0

1

0

1

0

0

1

1

0

0

1

0

0

miss

match

miss

miss

Current SavingControl

m

SLn SLnML

d d

biasML_EN

VAR

pre

sn MLS

Current-Saving ML sense circuitCurrent Source

ML

Control

DML

ML(0)

ML(1)

ML(m)

biasbiasprog.delaysearch-lines

ML_EN

ML_EN

ITh

Isink

prog.

IML

ML0

ML1

VAR1

MLS0ML_EN

Time (ns)

10 11 12 13

MLS0

VAR0

1.2

0.9

0.6

0.3

0

Vo

ltag

e (V

)

IML0

Time (ns)

IML7

10 11 12 13

IML1

IML2

IML3

Cu

rren

t (µ

A)

120

80

40

0

ML1f

ML0

VAR0

VAR1f

MLS0

10 11 12 13

Time (ns)

1.2

0.9

0.6

0.3Vo

ltag

e (V

)

0 1xRIR

0.8CML

1.2WMLCurrentVAR1f

ML1f

SavingControl

1.2WSW

CML

WMLCurrentVAR0

ML0

SavingControl

WSW

one-bit miss (worst-case model)

160mV

full match (typical model)

ML_EN MLS0

precharged-high current-race current-saving

MLsMLs

MLs

SLs

SLs

SLs

2.1

1.3

En

erg

y(f

J / b

it /

sear

ch)

3.4

1.4

0.7

5

4

3

2

1

0

[3] [2] [this work]

2.1

Figure 17.3.3: Voltage development on ML0 (fully-matched) and ML1(one-bit miss).

Figure 17.3.4: Current supplied to ML0, ML1,...,and ML7. (MLn is anML with n-bit mismatch).

Figure 17.3.5: Simulation results comparing the voltage on a typicalML0 against the voltage on a fast-rising ML1 (worst-case one-bit miss).

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Page 3: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

17

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Figure 17.3.7: Chip Layout.

ML

SA

1M

LS

A2

ML

SA

3M

LS

A4

MEMORY ARRAY1

MEMORY ARRAY2

MEMORY ARRAY3

MEMORY ARRAY4

BIT/SEARCH LINES DRIVERS

WO

RD

-LIN

E D

RIV

ER

S

Page 4: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Figure 17.3.1: CAM ML architectures.

m

Search Data

Differential Search Lines

Mat

chL

ines

m

NAND ML architecture

NOR ML architecture

d d

SLn SLn

md d

SL0 SL0

m

SL0 SL0 SLn SLn

ML

ML

d d d d

0 1 1 0

0

1

0

0

1

0

1

0

0

1

1

0

0

1

0

0

miss

match

miss

miss

Page 5: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Figure 17.3.2: Current-saving match-line sensing scheme.

Current SavingControl

m

SLn SLnML

d d

biasML_EN

VAR

pre

sn MLS

Current-Saving ML sense circuitCurrent Source

ML

Control

DML

ML(0)

ML(1)

ML(m)

biasbiasprog.delaysearch-lines

ML_EN

ML_EN

ITh

Isink

prog.

IML

Page 6: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

ML0

ML1

VAR1

MLS0ML_EN

Time (ns)

10 11 12 13

MLS0

VAR0

1.2

0.9

0.6

0.3

0

Vo

ltag

e (V

)

Figure 17.3.3: Voltage development on ML0 (fully-matched) and ML1 (one-bit miss).

Page 7: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

IML0

Time (ns)

IML7

10 11 12 13

IML1

IML2

IML3

Cu

rren

t (µ

A)

120

80

40

0

Figure 17.3.4: Current supplied to ML0, ML1,...,and ML7. (MLn is an ML with n-bit mismatch).

Page 8: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

ML1f

ML0

VAR0

VAR1f

MLS0

10 11 12 13

Time (ns)

1.2

0.9

0.6

0.3Vo

ltag

e (V

)

0 1xRIR

0.8CML

1.2WMLCurrentVAR1f

ML1f

SavingControl

1.2WSW

CML

WMLCurrentVAR0

ML0

SavingControl

WSW

one-bit miss (worst-case model)

160mV

full match (typical model)

ML_EN MLS0

Figure 17.3.5: Simulation results comparing the voltage on a typical ML0 against the voltage on afast-rising ML1 (worst-case one-bit miss).

Page 9: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Figure 17.3.6: NOR architecture: energy-per-search comparison.

precharged-high current-race current-saving

MLsMLs

MLs

SLs

SLs

SLs

2.1

1.3

En

erg

y(f

J / b

it /

sear

ch)

3.4

1.4

0.7

5

4

3

2

1

0

[3] [2] [this work]

2.1

Page 10: ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17ali/papers/isscc2003.pdf · 17 • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE Figure

• 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE

Figure 17.3.7: Chip Layout.

ML

SA

1M

LS

A2

ML

SA

3M

LS

A4

MEMORY ARRAY1

MEMORY ARRAY2

MEMORY ARRAY3

MEMORY ARRAY4

BIT/SEARCH LINES DRIVERSW

OR

D-L

INE

DR

IVE

RS