a high-speed and low-voltage associative co-processor with

4
A High-Speed and Low-Voltage Associative Co-Processor With Hamming Distance Ordering Using Word-Parallel and Hierarchical Search Architecture Yusuke Oike , Makoto Ikeda †‡ , and Kunihiro Asada †‡ Dept. of Electronic Engineering, University of Tokyo VLSI Design and Education Center (VDEC), University of Tokyo 7–3–1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan Phone: +81-3-5841-6719, Fax: +81-3-5841-8912 E-mail: {y-oike, ikeda, asada}@silicon.u-tokyo.ac.jp Abstract We present a new concept and its circuit implementation for a high-speed and low-voltage associative co-processor with Hamming distance ordering. A hierarchical search architec- ture keeps high speed in large input number. Our circuit im- plementation allows unlimited data base capacity and achieves low-voltage operation under 1.0V for SoC applications, which are dicult for the conventional analog approaches. The search logic embedded in a memory cell realizes word-parallel Ham- ming distance ordering for high-speed sorting/routing appli- cations as well as near/nearest-match detection for recogni- tion. Our fabricated 0.18 μm 64-bit 32-word associative co- processor operates 411.5 MHz and 40.0 MHz at 1.8V and 0.75V respectively. Introduction Some applications, such as data compression, pattern recogni- tion, multi-media and intelligent processing system, require a huge amount of memory access and data processing time. To reduce them, a lot of context addressable memories (CAMs) are developed [1]–[3]. These CAMs can quickly detect a com- pletely matched data in a data base. In recent years, advanced applications require to detect not only a completely matched data but also a near-match data. The CAMs using analog cir- cuit technologies have been proposed for quick nearest-match detection [4]–[8]. Their circuit implementations are compact in general, however, dicult to operate in deep sub-micron (DSM) process and low voltage supply. Therefore they are not suitable for a system-on-chip VLSI in DSM process. In this paper, we present a high-speed and low-voltage as- sociative co-processor using a hierarchical search architecture with the capability of word-parallel Hamming distance order- ing. It has three advantages: (1) The first advantage is high- speed search in large data base due to a hierarchical search ar- chitecture. The search time of our method is limited by O( N) or O(log M) at N-bit M-word data capacity. In addition, it has no limitation of the number of data patterns M, the bit length N and the search distance theoretically. (2) The second advan- tage is low-voltage operation in DSM. The circuit implementa- tion has a tolerance for device fluctuation in DSM and allows a low-voltage operation under 1.0V, which is dicult for the conventional analog approaches. (3) The third advantage is ad- ditional functions for associative processing. The synchronous search logic embedded in a memory cell realizes word-parallel Hamming distance ordering for high-speed sorting/routing ap- plications as well as near/nearest-match detection for recogni- 0 1 0 1 0 0 1 1 Data A A B 1 1 0 0 0 1 1 1 1 0 0 1 0 1 0 0 Data B b0 b1 b2 b3 b4 b5 b6 b7 b0 b1 b2 b3 b4 b5 b6 b7 + n bit match bit (SS passes through) mis-match bit (SS stops) b0 b1 b2 b3 b4 b5 b6 b7 search signal(SS) stop stop stop mask mask mask clock period 0 clock period 1 clock period 2 clock period 3 pass through pass through pass through detected Fig. 1 Basic Operation without Hierarchical Search. tion. A 64-bit 32-word associative co-processor has been fab- ricated in 0.18 μm CMOS process and successfully tested. Architecture of Hamming Distance Ordering A. Basic Search Operation Fig.1 shows a basic operation of Hamming distance (HD) or- dering without hierarchical search. The operation has a data comparison and a search signal propagation. First, the input data is compared with each template data in bit-parallel way using XOR/XNOR gates. Then search signals start from LSBs of each word. The search signal passes through the matched bit data. The completely matched data (HD = 0) are detected in the first clock period since the search signal passes through MSB. In the next clock period, the first-encountered mismatched bit is masked in each word and the search signals restart to the next mismatched bit. Thus, the data of HD = 1 are detected. After this manner, the data of HD = n are detected in the n-th clock period as shown in Fig.1. The search operation can de- tect not only the nearest-match data but also all data in order of Hamming distance. B. Word-Parallel and Hierarchical Search Structure Search time of the basic operation is limited by the propaga- tion time of the search signal, so it is linearly-related to a data length in a ripple-mode implementation. Fig.2 shows an op- eration diagram of word-parallel and hierarchical structure for high-speed Hamming distance search in large input number. The template data is divided into some blocks. The search sig- nal (SS) propagates to hierarchical search nodes (HN) through each block simultaneously as shown in Fig.2 (a). Each hierar- chical node provides a permission signal (PS) to the next block as shown in Fig.2 (b). The permission signal makes a mis- matched bit maskable. At the next clock period, the maskable mismatched bit is masked only when the search signal is ar-

Upload: others

Post on 01-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

A High-Speed and Low-Voltage Associative Co-Processor With Hamming DistanceOrdering Using Word-Parallel and Hierarchical Search Architecture

Yusuke Oike†, Makoto Ikeda†‡, and Kunihiro Asada†‡

†Dept. of Electronic Engineering, University of Tokyo‡VLSI Design and Education Center (VDEC), University of Tokyo

7–3–1 Hongo, Bunkyo-ku, Tokyo 113-8656, JapanPhone:+81-3-5841-6719, Fax:+81-3-5841-8912

E-mail: {y-oike, ikeda, asada}@silicon.u-tokyo.ac.jp

Abstract

We present a new concept and its circuit implementation fora high-speed and low-voltage associative co-processor withHamming distance ordering. A hierarchical search architec-ture keeps high speed in large input number. Our circuit im-plementation allows unlimited data base capacity and achieveslow-voltage operation under 1.0V for SoC applications, whichare difficult for the conventional analog approaches. The searchlogic embedded in a memory cell realizes word-parallel Ham-ming distance ordering for high-speed sorting/routing appli-cations as well as near/nearest-match detection for recogni-tion. Our fabricated 0.18µm 64-bit 32-word associative co-processor operates 411.5 MHz and 40.0 MHz at 1.8V and0.75V respectively.

Introduction

Some applications, such as data compression, pattern recogni-tion, multi-media and intelligent processing system, require ahuge amount of memory access and data processing time. Toreduce them, a lot of context addressable memories (CAMs)are developed [1]–[3]. These CAMs can quickly detect a com-pletely matched data in a data base. In recent years, advancedapplications require to detect not only a completely matcheddata but also a near-match data. The CAMs using analog cir-cuit technologies have been proposed for quick nearest-matchdetection [4]–[8]. Their circuit implementations are compactin general, however, difficult to operate in deep sub-micron(DSM) process and low voltage supply. Therefore they are notsuitable for a system-on-chip VLSI in DSM process.

In this paper, we present a high-speed and low-voltage as-sociative co-processor using a hierarchical search architecturewith the capability of word-parallel Hamming distance order-ing. It has three advantages: (1) The first advantage is high-speed search in large data base due to a hierarchical search ar-chitecture. The search time of our method is limited by O(

√N)

or O(logM) at N-bit M-word data capacity. In addition, it hasno limitation of the number of data patternsM, the bit lengthN and the search distance theoretically. (2) The second advan-tage is low-voltage operation in DSM. The circuit implementa-tion has a tolerance for device fluctuation in DSM and allowsa low-voltage operation under 1.0V, which is difficult for theconventional analog approaches. (3) The third advantage is ad-ditional functions for associative processing. The synchronoussearch logic embedded in a memory cell realizes word-parallelHamming distance ordering for high-speed sorting/routing ap-plications as well as near/nearest-match detection for recogni-

0 1 0 1 0 0 1 1Data A

A B

1 1 0 0 0 1 1 11 0 0 1 0 1 0 0

Data B

b0 b1 b2 b3 b4 b5 b6 b7

b0 b1 b2 b3 b4 b5 b6 b7

+n bit

match bit (SS passes through)

mis-match bit (SS stops)

b0 b1 b2 b3 b4 b5 b6 b7search signal(SS)

stop

stop

stop

mask

mask

mask

clock period 0

clock period 1

clock period 2

clock period 3

pass through

pass through

pass through

detected

Fig. 1 Basic Operation without Hierarchical Search.

tion. A 64-bit 32-word associative co-processor has been fab-ricated in 0.18µm CMOS process and successfully tested.

Architecture of Hamming Distance Ordering

A. Basic Search Operation

Fig.1 shows a basic operation of Hamming distance (HD) or-dering without hierarchical search. The operation has a datacomparison and a search signal propagation. First, the inputdata is compared with each template data in bit-parallel wayusing XOR/XNOR gates. Then search signals start from LSBsof each word. The search signal passes through the matched bitdata. The completely matched data (HD= 0) are detected in thefirst clock period since the search signal passes through MSB.In the next clock period, the first-encountered mismatched bitis masked in each word and the search signals restart to thenext mismatched bit. Thus, the data of HD= 1 are detected.After this manner, the data of HD= n are detected in then-thclock period as shown in Fig.1. The search operation can de-tect not only the nearest-match data but also all data in order ofHamming distance.

B. Word-Parallel and Hierarchical Search StructureSearch time of the basic operation is limited by the propaga-tion time of the search signal, so it is linearly-related to a datalength in a ripple-mode implementation. Fig.2 shows an op-eration diagram of word-parallel and hierarchical structure forhigh-speed Hamming distance search in large input number.The template data is divided into some blocks. The search sig-nal (SS) propagates to hierarchical search nodes (HN) througheach block simultaneously as shown in Fig.2 (a). Each hierar-chical node provides a permission signal (PS) to the next blockas shown in Fig.2 (b). The permission signal makes a mis-matched bit maskable. At the next clock period, the maskablemismatched bit is masked only when the search signal is ar-

n bit cells n+1 bit cells n+2 bit cells n+3 bit cells

(a) search signal (SS) propagation path

(c) operation diagram (ex. HD=2)

clock period: 0

clock period: 1

clock period: 2

n bit cells n+1 bit cells n+2 bit cells n+3 bit cells

(b) permission signal (PS) path

output

SS propagating path

SS propagated path

PS path

hierarchical search node

detected HD=2

match bit

mis-match bit

SS stops

masked

masked

SS restarts

SS starts SS starts SS starts SS starts

maskable

maskable

SS stops

mis-match bit (maskable)

SS restartsSS restarts

mis-match bit (masked)

Fig. 2 Operation Diagram of Hierarchical Search.

rived. A number of clock cycles, when the search signal out-puts from the last hierarchical node, stands for the Hammingdistance of the data. For example, some search signals stopat the mismatched bit in each block and the others pass to thehierarchical node as shown in Fig.2 (c). The first mismatchedbit becomes maskable at the clock period 0. The data of HD= 0 is detected at the same time. At the next clock period, thefirst mismatched bit is masked and the search signal restarts.The search signal updates the permission signals and the nextmismatched bit becomes maskable. Thus the data of HD= 2 isdetected at the clock period 2. In this architecture, the criticalpath is the search signal propagation path of one block and thehierarchical bypass line. The search time has similar character-istics of a carry-bypass adder, so that it is applicable to a largedata base.

Circuit Configuration

Fig.3 shows a schematic of our associative memory cell. Thememory cell is composed of a SRAM cell, an XOR/XNORcircuit for comparison with the input data, and a search circuitfor signal propagation and masking. Even-numbered and odd-numbered search circuits are complementary in order to reducethe critical path and the circuit area. In a matched bit, the searchsignal (SS) always passes to the next bit since the result (M) ofcomparison is true. In a mismatched bit, the SS stops and waitsfor the next clock period. In the next clock period, the false M

φ1 φ1

φ1φ1

φ2

φ2

φ2

φ2

B 2i B2iD 2i D2i B 2i B2iD 2iD2i

PSj-1

PSj-1SS 2i

SS 2i

SS 2i

SS 2i+1

SS 2i+1

M2i

M2i

XOR Cell

Search Cell

B 2i-1 B2i-1

W W

SS 2i-1 SS 2i-1

SS 2i-1

PSj-1

PSj-1

SS 2iSS 2i

M2i-1

M2i-1

B 2i-1 B2i-1

D 2i-1 D2i-1 D 2i-1 D2i-1

SRAM Cell

XNOR Cell

XNOR Cell

Search Cell Search Cell

XOR Cell

Search Cell

(a-1) odd-numbered cell (b-1) odd-numbered cell

(a-2) even-numbered cell (b-2) even-numbered cell

W W

SRAM CellSRAM Cell

Fig. 3 Schematic of the Associative Memory Cell: (a) Static CircuitImplementation, (b) Compact Implementation.

PRIORITY_IN1

PRIORITY_OUT1

PRIORITY_OUT2

PRIORITY_OUT3

PRIORITY_OUT4

PRIORITY_OUT5

PRIORITY_OUT6

PRIORITY_OUT7

PRIORITY_OUT8

PRIORITY_IN2

PRIORITY_IN3

PRIORITY_IN4

PRIORITY_IN5

PRIORITY_IN6

PRIORITY_IN7

PRIORITY_IN8

ADDRESS0ADDRESS1

ADDRESS2

RST

CAMOUT PRIORITY_IN

PRIORITY_OUT(a)

(b)

priority decision circuit address encoder

Fig. 4 Schematic of: (a) Detected Data Selector, (b) Binary-Tree Pri-ority Encoder.

is masked and the SS restarts at the cell where both the searchsignal (SS) and the permission signal (PS) are true. Thereforeonly one mismatched bit is masked in word parallel and alldata can be detected in order of Hamming distance. Fig.3 (a)shows static circuit implementation. It realizes a high tolerancefor device fluctuation and a low-voltage operation. Fig.3 (b)shows compact circuit implementation using dynamic circuits.It saves a search circuit area for large capacity.

Fig.4 (a) shows a detected data selector, which masks oneoutput of the detected data in the same clock period after itsaddress encoding. All detected data in the same HD can beencoded by the next priority encoder stage. Fig.4 (b) shows abinary-tree priority encoder. It realizes a small area and quickaddress encoding with O(logM) delay time atM-word capac-ity.

RAM RAM RAM

SC SC SCHN

RAM RAM RAM

SC SC SCHN

RAM RAM RAM

SC SC SCHN

W1

HB11 (n bit) HB12 (n+1 bit) HB1j (n+j-1 bit) SO1

SOi: search results per word

PIi: priority encoder inputs

POi: priority decision outputs

PI1

ADDRESSCOMP

PO1

PO2

POi

PI2

PIi

Det

ecte

d D

ata

Sel

ecto

r

Row

Dec

oder

Memory Read/Write Circuit, Data Buffer

Prio

rity

Dec

isio

n C

ircui

t and

Enc

oder

SS

RAM RAM RAM

SC SC SCHN

RAM RAM RAM

SC SC SCHN

RAM RAM RAM

SC SC SCHN

W2HB21 HB22 HB2j

SO2

SS

RAM RAM RAM

SC SC SCHN

RAM

SC

SRAM cell HBij: hierarchical cell block

SS: search signal pathembedded search circuit

hierarchical nodeHN

RAM RAM RAM

SC SC SCHN

RAM RAM RAM

SC SC SCHN

WiHBi1 HBi2 HBij

SOi

SS

PS: permission signal path

B1/B1B1/B1B1/B1

Fig. 5 Block Diagram of the Fabricated Associative Co-Processor.

2.8mm

2.8m

m

Memory R/W Circuit, Data BufferMemory R/W Circuit, Data BufferMemory R/W Circuit, Data Buffer

Priority EncoderPriority EncoderPriority EncoderRow DecoderRow DecoderRow Decoder(TEG)(TEG)(TEG) (TEG)(TEG)(TEG)

(TEG)(TEG)(TEG)

64 bit, 32 word Cells64 bit, 32 word Cells64 bit, 32 word Cells

64 bit, 2 word 64 bit, 2 word Compact ImplementationCompact Implementation

64 bit, 2 word Compact Implementation

Fig. 6 Chip Microphotograph.

Chip Implementation

We have designed and fabricated a 64-bit 32-word associativeco-processor using the present architecture and the static circuitimplementation in 0.18µm CMOS process1. Fig.5 illustratesa block diagram of the fabricated memory module and Fig.6shows its chip microphotograph and components. The asso-ciative co-processor has 64× 32 memory cells with the searchcircuit, a memory read/write circuit with data buffers, a worddecoder, and a 32-input priority encoder with a detected dataselector. We have also designed a 64-bit 2-word associativememory using the compact implementation for performanceevaluation on the same chip.

Measurement Results and Discussions

A. Area and CapacityThe designed 64-bit 32-word associative co-processor occu-pies 475µm × 1160µm (0.55 mm2). The area of a memorymacro cell with a static search circuit is 9.6µm × 13.6 µm(130.56µm2) as shown in Fig.7 (a). In the static circuit im-plementation using 0.18µm process, the cell area is×6 and

1The chip in this study has been fabricated through VLSI Design and Edu-cation Center(VDEC), University of Tokyo in collaboration with Hitachi Ltd.and Dai Nippon Printing Co.

13.6µm13.6µm

SRAM CellSRAM Cell

XOR CellXOR Cell

SRAM CellSRAM Cell

XOR CellXOR CellSearch CircuitSearch Circuit

8.8µm

9.6µ

m9.

6µm

7.2µ

m7.

2µm

(a) Static Circuit Implementation (b) Compact Implementation

Search CircuitSearch Circuit

Fig. 7 Layout of the Associative Memory Cell: (a) Static Circuit Im-plementation, (b) Compact Implementation.

0.00 2.00 4.00Time (ns)

Sig

nal l

evel

(a.

u.)

6.00

CLKCLK SOiSOi

critical path delay2.18ns

critical path delay2.18ns

Fig. 8 Measured Waveforms of the Search Signal Propagation.

×3 as large as a 6T SRAM cell and a standard complete-matchCAM cell respectively. Fig.7 (b) shows a layout of the com-pact implementation using dynamic circuits. It occupies 7.2µm × 8.8µm (63.36µm2). In this case, the cell area is×3 and×2 as large as a 6T SRAM cell and a standard complete-matchCAM cell. The number of transistors in our memory cell islarger than the conventional analog approaches [4]–[8]. Theanalog approaches are, however, difficult to follow device scal-ing especially in DSM process with keeping its performanceand marginal capacity. Our approach can follow device scal-ing and operate in low supply voltage because of synchronousdigital search logics embedded in memories. Besides, it has nolimitation of capacity and search distance. Therefore our as-sociative co-processor has more potential for practical use andlarge capacity than the conventional designs.

B. Operation Speed

Fig.8 shows measured waveforms using an electron beamprobe at room temperature. It shows a delay time of the criti-cal path from the search circuit clock (CLK) to a search out-put (SOi). The delay time for Hamming distance search in64-bit data length is 2.18 ns in the worst case. The opera-tion speed of the fabricated associative co-processor is 411.5MHz and 40.0 MHz at 1.8V and 0.75V power supply respec-tively. Fig.9 shows measurement results of the operation speedin 0.75V-to-1.8V power supply. In the Hamming distance or-dering, the search time needs clock counts corresponding toits Hamming distance. It takes 65 clock periods to order alldata from 0-bit distance to 64-bit distance. Our fabricated as-sociative co-processor completes the Hamming distance order-ing for sorting/routing of all data in 158.0 ns. It’s difficult toimplement such a function in high speed by the conventionalanalog techniques. When the target application requires only

0.8

1

1.2

1.4

1.6

1.8

0 50 100 150 200 250Frequency (MHz)

Pow

er s

uppl

y (V

)

300 350 400 450

[email protected]

measurement results

PASS

FAIL

[email protected]

Fig. 9 Operation Frequency vs Power Supply Voltage.

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

0 100 200 300 400 500Num. of words M (word)

Bit length N (bit)

Cyc

le ti

me

(ns)

priority encoder stage : O(logM)(simulation results, bottom axis)

distance search stage : O( N)(simulation results, upper axis)

module cycle time(measured result, 64bit 32word)

600 700 800 900 1000 1100

0 100 200 300 400 500 600 700 800 900 1000 1100

Fig. 10 Cycle Time and Data Capacity.

nearest-match data, the search time depends on its Hammingdistance. For example, the nearest-match detection is com-pleted in 41.3 ns at 16-bit Hamming distance. This operationspeed is also difficult for the conventional analog approaches.The worst case of nearest-match detection is the same as theordering operation.

Fig.10 shows the relation between a bit or word length and asearch cycle time. The search time is limited by the search sig-nal propagation or the priority encoding. The former is decidedby a bit length and its time is O(

√N) at N-bit length. On the

other hand, the latter is decided by a word length and its time isO(logM) at M-word length. Therefore our method keeps highspeed in large data base as shown in Fig.10. The Hamming dis-tance ordering has no limitation of data capacity as mentionedabove.

C. Power Dissipation

The power dissipation of the associative co-processor is< 51.3mW at 1.8V power supply and 400 MHz operation. In low-power operation, it is 1.18 mW at 0.75V power supply and40 MHz operation. The search accuracy of the conventionalanalog approach is unstable and sometimes senseless in low-power operation. Our search operations are precise regardlessof a power supply voltage. The specifications of the fabricatedco-processor are summarized in Table I.

TABLE I Specifications of the Fabricated Associative Co-Processor.Process 0.18µm CMOS 5-Metal 1-Poly-SiPower Voltage Supply 0.7 V – 1.8 VOrganization 64 Bit× 32 Word Memory Cells

32-Input Priority EncoderFunctions Distance Ordering and Nearest-MatchModule Size 475µm × 1160µm (0.55 mm2)Num. of Transistors 88.5k transistorsMemory Cell Size 9.6µm × 13.6µm (130.56µm2)

†7.2µm × 8.8µm (63.36µm2)Search Time Order O(

√N) (@ N-Bit capacity)

Encoding Time Order O(logM) (@ M-Word capacity)Operation Speed 411.5 MHz (@ 1.8V, Measured)

454.5 MHz (@ 1.8V, Simulated)40.0 MHz (@ 0.75V, Measured)41.4 MHz (@ 0.75V, Simulated)

Distance Ordering Time 154.0 ns (0-bit to 64-bit distance)Power Dissipation 51.3 mW (@ 1.8V, 400MHz)

1.18 mW (@ 0.75V, 40MHz)†designed by the compact implementation

Conclusions

We proposed a new concept and its circuit implementation fora high-speed and low-voltage associative co-processor in DSMprocess to solve the problems of the conventional analog tech-niques. It achieves no limitation of data capacity and keepshigh speed in large data base due to a hierarchical search archi-tecture and a synchronous search logic embedded in a mem-ory cell. Our extended functions, such as Hamming distanceordering, are effectively applied to high-speed sorting/routingapplications as well as near/nearest-matching applications. Wehave designed and fabricated a 64-bit 32-word associative co-processor in 0.18µm CMOS process and shown a high-speedand low-voltage operation. The operation speed achieves 411.5MHz and 40.0 MHz at 1.8 V and 0.75 V supply voltage respec-tively.

References[1] T. Ogura et al., “A 20-kbit Associative Memory LSI for Artificial In-

telligence Machines,”IEEE J. Solid-Stage Circuit, vol. 24, no. 4, pp.1014-1020, Aug. 1989.

[2] T. Ikenaga et al., “A Fully Parallel 1-Mb CAM LSI for Real-Time Pixel-Parallel Image Processing,”IEEE J. Solid-Stage Circuit, vol. 35, no. 4,pp. 536-544, Apr. 2000.

[3] H. Miyatake et al., “A Design for High-Speed Low-Power CMOS FullyParallel Content-Addressable Memory Macros,”IEEE J. Solid-StageCircuit, vol. 36, no. 6, pp. 956-968, June 2001.

[4] T. Yamashita et al., “Neuron MOS Winner-Take-All Circuit and Its Ap-plication to Associative Memory,”ISSCC Dig. Tech. Papers, pp. 236-237, Feb. 1993.

[5] M. Nagata et al., “A Minimum-Distance Search Circuit using Dual-LinePWM Signal Processing and Charge-Packet Counting Techniques,”ISSCC Dig. Tech. Papers, pp. 42-43, Feb. 1997.

[6] M. Ikeda et al., “Time-Domain Minimum-Distance Detector and Its Ap-plication to Low-Power Coding Scheme on Chip-Interface,”Proc. ofEur. Solid-Stage Circuit Conf. (ESSCIRC), pp. 464-467, 1998.

[7] H. J. Mattausch et al., “An Architecture for Compact Associative Mem-ories with Deca-ns Nearest-Match Capability up to Large Distances,”ISSCC Dig. Tech. Papers, pp. 170-171, 2001.

[8] H. J. Mattausch et al., “Fully-Parallel Pattern-Matching Engine withDynamic Adaptability to Hamming or Manhattan Distance,”Symp. onVLSI Circuits Dig. Tech. Papers, pp. 252-255, 2002.