1 11. finite-precision effects and pipeline adaptive filters in practice, an adaptive filter is...

1

11. Finite-Precision Effects and

Pipeline Adaptive Filters

In practice, an adaptive filter is usually implemented digitally. Thus, finite-precision problems arise.

There are two ways to implement a filter; fixed-point or floating-point. In most cases, fixed-point implementation is preferred.

In digital implementation, there are essentially two sources of quantizations.– A/D conversion

– Finite word-length arithmetic Signal is usually received in a analog form, the A/D conve

rter is a device to sample and quantize the signal.

2

Ideal A/D converter:

As we can see quantization error always exists. If the quantization level is large, we can assume the error is uniformly distributed and treat it as a additive noise.

3

Calculation of quantization noise:

An N bits ADC can have a quantization error from -1/2N to 1/2N (1/2x2/2N).

The average quantization power is

4

Assuming that the dynamic range of ADC and DAC is between -1 and 1 (the maximum magnitude is 0 dB), then the quantization noise power is

Thus, we have a 6-dB rule of thumb for the quantization noise.

In a digital system, a finite word length is commonly used to store the result of internal arithmetic calculations. Thus after a arithmetic operation (addition, multiplication), the results must be quantized and this results in the round off or truncation effect.

Due to above effects, the digital version of the filter may exhibits a response deviating from the ideal one.

5

For a digitally realized LMS adaptive filter, there are many sources that will introduce quantization errors.

For the input quantizer connected to input u(n), we have

For the quantizer connected to the desired signal d(n), we have

For the quantized tap-weight vector, we have

For the filter output, we have

)()()]([)( nnnQn uq ηuuu

)()()]([)( nndndQnd dq

)()()]([)( nnnQnq wwww

)()()()]()([)( nηnnnnQny yT

qTqq wuwu

6

The finite-precision LMS algorithm is described by

Thus, the error is then

Assuming that the step size is small and invoking the independence assumption, it has been shown that

)]()([)()(

)()()(

nneQnn

nyndne

qqqq

qqq

uww

1

)]()()()()([)]()()([

)()()()()()(

nnnnnnnnd

nnndnyndne

yTu

TT

yTqq

wηuwwu

u

size step the to tindependen ;(n) and (n) to due :)(

size step the to alproportion inversely ; to due :),(

entmisadjustm :

MMSE :

)(),()()]([

yu

min

min

22

21

22

21

2 1

data

w

dataw

w

M

J

MJneE

7

Decreasing step size reduces the misadjustment, however, increases the effect of quantization error.

A digital implementation of the LMS algorithm stops adapting or stalls, whenever the correction term eq(n)uq(n-i) for the ith tap weight is smaller than in magnitude than the least significant bit (LSB) of the tap weight.

Let the the root mean square (rms) value of uq(n0-I) be Ar

ms. Then, if

the LMS algorithm stop adaptation. The quantity eD() is called the digital residual error.

LSBinune qq |)()(| 00

)(|)(|

Drms

q eA

LSBne

8

To prevent the stalling pehnomenon, eD() must be made as small as possible. This is can be achieved by– The LSB is reduced by picking a sufficiently large number of

bits for digital representation of each tap weight.

– The step-size parameter is made as large as possible, while still guaranteeing convergence of the algorithm.

There is another numerical problem called parameter drift; that is, tap weights in the LMS algorithm attain arbitrarily large values despite bounded inputs, disturbances, and errors.

To stablize the digital implementation LMS algorithm, we may use the leaky LMS algorithm.

This algorithm provides a compromise between minimizing the MSE and minimizing the filter power.

9

Leaky LMS algorithm:

It can be shown that the update equation is

where is a constant satisfying the condition

Except for the leakage factor, the algorithm is the same as the conventional LMS algorithm.

22 ||)(||)()( nnenJ w

)()()()()( nnenn uww 11

10

10

Note that the inclusion of the leaky factor has the equivalent effect of adding a white noise sequence of zero mean and variance to the input process.

This suggests another way for stabilizing a digital implementation of the LMS algorithm. A relatively weak white-noise sequence of variance known as dither, is added to the input process u(n), and samples of the combination are then used as tap inputs.

For the RLS algorithm, there is a numerical instability problem to be considered when it is implemented in finite-precision arithmetic.

Divergence of the RLS algorithm is primarily because the matrix P(n) loses its positive definiteness property or Hermitian symmetry.

11

The Hermitian symmetry preserving RLS algorithm:

)]()()([)(

)()()()(

)()()()(

)()()(

)()(

)()()(

,

)(,)(

nnnTrin

nnnn

nnndn

nnrn

nnr

nnn

compute,,nFor

tionInitializa

H

H

H

πkPP

kww

uw

πk

πu

uPπ

wIP :

1

1

1

1

1

21

000

1

1

Tri{.} : only compute the upper or lower triangular part of {.},and fills in the rest of the matrix to preserve Hermitiansymmetry

12

The stalling phenomenon is directly link to the forgetting factor and the input signal variance. As we known,

Thus,

If the forgetting factor is close to one and/or the input data variance is large, the RLS may stall.

1

RΦ )(nE

1

2211 1

1

uu

nEnE

RRΦP )()()(

13

The FIR filter plays the fundamental role in digital signal processing.

How to implement an FIR is of great concern. For the ASIC solution, we have to consider– Throughput (speed)

– Gate count (area)

– Power consumption

– Delay

– modular structure For the adaptive FIR filter, we have the similar concerns. Conventionally, there are two structures for FIR filters;

namely, direct form and transposed form. If filter’s coefficients are fixed, these two forms are

equivalent.

14

Transversal and transposed form:

The equivalence of these two forms can be easily verified using the retiming technique.

15

Retiming:

The retiming technique is simple, however, it is very useful deriving new pipeline structure.

16

The disadvantage of the direct form (DF) is its low speed (needs to complete many multiplications in one clock cycle).

The transposed form (TF) can overcome this problem. However, input has to drive all weights (fan-in problem).

To solve the low speed problem in the DF, we can use the idea of pipelining (insert delays).

To solve the driving problem in TF, we can insert delays between weights.

This approach also results in a modular structure called systolic architecture.

17

Pipelined processing:

)()()(

)()()()(

nbnanx

nbnanxny

z(n))()()(

)()()(

111

nbnzny

nanxnz

18

The results:

19

W1 is obtained by inserting delays in the TF while W2 by inserting delays in the DF.

The transfer functions:

The clock rate for the structure (a) must be doubled to keep the same throughput. The structure (b) has N extra delays.

Thus, these two structures are less attractive.

1

0:2

1

0

211

)()(: For

)()(: For

N

k

kk

N

N

k

kk

zwzzXzYW

zwzzXzYW

20

The hybrid form (HF):– Pipelined structure without extra latency.

– Trades between speed and area/power consumption

21

The hybrid form I is obtained from the DF while II is from the TF (using retiming).

When the filter weights are time-varying, the TF has the weight delay problem. This will may change the behavior of the original LMS algorithm.

It is possible to “equalize” the delays in weights. In other words, w0 is not delayed, w1 is delayed by one clock cycle,…, w4 is delayed by 4 clock cycles. However, this will increase the number of delays.

Besides coefficient delay, the TF has the highest power consumption, since the number of bits in the output path (accumulation path) is usually much greater than those in the input path.

The hybrid form serves a compromise between the DF and the TF.

22

The implementation of the adaptive FIR filter also has DF, TF, and HF

The DF adaptive LMS filter:

23

The DF and HF:

24

To obtain pipelining, we have to delay filter weights, as a result, the LMS has to be modified.

This is called the delayed LMS algorithm.

The delayed LMS algorithm is a special case of delay relaxation algorithm described below.

To have an adaptive pipelined filter with higher pipeline structure, we can apply architecture/algorithm transforms

Architecture transformation:– Pipelining

– Parallelism

– retiming

)()()()( DnDnenn xww 1

25

Algorithm transformation:– Look ahead

– Relaxed look ahead

Parallelism:

26

Retiming:

27

Look-ahead:

Thus, the clock rate can be increase M times and M clocks are used to finish a multiplication (M x1/M=1).

Relaxed look-ahead: – It is derived from the look-ahead and is served as an

approximation to the original adaptive algorithm. Three kinds of relaxed look-ahead

– Delay relaxation

– Sum relaxation

– Delay transfer relaxation

)()()()( nnen-n xww 1

M-stage look-ahead

1

0

M

i

inineMnn )()()()( xww

28

Example (look-ahead):

)()()()()( nunbnxnanx 1

)()()()()()()()(

)()()()()(

1111

11112

nunbnunbnanxnana

nunbnxnanx

D

D

3D

a(n+2)

b(n+2)

u(n+2)x(n)

x(n+3)

D

D

Can be used for retiming

29

Delay relaxation:– Assume that the gradient does not have significant change

during M1 samples.

Sum relaxation:– The M terms in the gradient estimated is approximated by

M’ (M’<M) term.

1

0

M

i

inineMnn )()()()( xww

1

011

M

i

MinMineMnn )()()()( xww

1

011

M

i

MinMineMnn )()()()( xww

1

011

'

)()()()(M

i

MinMineMnn xww

30

Delay transfer relaxation:

Delay relaxation

Delay transfer

)( 1Dna

a(n-D1)

31

For example:

D

D D

D

5D

5D

d(n-5)

e(n)w(n-1)

w(n)

x(n)

32

Retiming:

D

D D

D

5D

4D

d(n-5)

e(n)w(n-1)

w(n)

D D

x(n)

33

Retiming/Pipelining:

D

D D

D

3DD

D D D D

DDD

d(n-2)

x(n)

1 11. finite-precision effects and pipeline adaptive filters in practice, an adaptive filter is...

Documents