1 11. finite-precision effects and pipeline adaptive filters in practice, an adaptive filter is...
TRANSCRIPT
1
11. Finite-Precision Effects and
Pipeline Adaptive Filters
In practice, an adaptive filter is usually implemented digitally. Thus, finite-precision problems arise.
There are two ways to implement a filter; fixed-point or floating-point. In most cases, fixed-point implementation is preferred.
In digital implementation, there are essentially two sources of quantizations.– A/D conversion
– Finite word-length arithmetic Signal is usually received in a analog form, the A/D conve
rter is a device to sample and quantize the signal.
2
Ideal A/D converter:
As we can see quantization error always exists. If the quantization level is large, we can assume the error is uniformly distributed and treat it as a additive noise.
3
Calculation of quantization noise:
An N bits ADC can have a quantization error from -1/2N to 1/2N (1/2x2/2N).
The average quantization power is
4
Assuming that the dynamic range of ADC and DAC is between -1 and 1 (the maximum magnitude is 0 dB), then the quantization noise power is
Thus, we have a 6-dB rule of thumb for the quantization noise.
In a digital system, a finite word length is commonly used to store the result of internal arithmetic calculations. Thus after a arithmetic operation (addition, multiplication), the results must be quantized and this results in the round off or truncation effect.
Due to above effects, the digital version of the filter may exhibits a response deviating from the ideal one.
5
For a digitally realized LMS adaptive filter, there are many sources that will introduce quantization errors.
For the input quantizer connected to input u(n), we have
For the quantizer connected to the desired signal d(n), we have
For the quantized tap-weight vector, we have
For the filter output, we have
)()()]([)( nnnQn uq ηuuu
)()()]([)( nndndQnd dq
)()()]([)( nnnQnq wwww
)()()()]()([)( nηnnnnQny yT
qTqq wuwu
6
The finite-precision LMS algorithm is described by
Thus, the error is then
Assuming that the step size is small and invoking the independence assumption, it has been shown that
)]()([)()(
)()()(
nneQnn
nyndne
qqqq
qqq
uww
1
)]()()()()([)]()()([
)()()()()()(
nnnnnnnnd
nnndnyndne
yTu
TT
yTqq
wηuwwu
u
size step the to tindependen ;(n) and (n) to due :)(
size step the to alproportion inversely ; to due :),(
entmisadjustm :
MMSE :
)(),()()]([
yu
min
min
22
21
22
21
2 1
data
w
dataw
w
M
J
MJneE
7
Decreasing step size reduces the misadjustment, however, increases the effect of quantization error.
A digital implementation of the LMS algorithm stops adapting or stalls, whenever the correction term eq(n)uq(n-i) for the ith tap weight is smaller than in magnitude than the least significant bit (LSB) of the tap weight.
Let the the root mean square (rms) value of uq(n0-I) be Ar
ms. Then, if
the LMS algorithm stop adaptation. The quantity eD() is called the digital residual error.
LSBinune qq |)()(| 00
)(|)(|
Drms
q eA
LSBne
8
To prevent the stalling pehnomenon, eD() must be made as small as possible. This is can be achieved by– The LSB is reduced by picking a sufficiently large number of
bits for digital representation of each tap weight.
– The step-size parameter is made as large as possible, while still guaranteeing convergence of the algorithm.
There is another numerical problem called parameter drift; that is, tap weights in the LMS algorithm attain arbitrarily large values despite bounded inputs, disturbances, and errors.
To stablize the digital implementation LMS algorithm, we may use the leaky LMS algorithm.
This algorithm provides a compromise between minimizing the MSE and minimizing the filter power.
9
Leaky LMS algorithm:
It can be shown that the update equation is
where is a constant satisfying the condition
Except for the leakage factor, the algorithm is the same as the conventional LMS algorithm.
22 ||)(||)()( nnenJ w
)()()()()( nnenn uww 11
10
10
Note that the inclusion of the leaky factor has the equivalent effect of adding a white noise sequence of zero mean and variance to the input process.
This suggests another way for stabilizing a digital implementation of the LMS algorithm. A relatively weak white-noise sequence of variance known as dither, is added to the input process u(n), and samples of the combination are then used as tap inputs.
For the RLS algorithm, there is a numerical instability problem to be considered when it is implemented in finite-precision arithmetic.
Divergence of the RLS algorithm is primarily because the matrix P(n) loses its positive definiteness property or Hermitian symmetry.
11
The Hermitian symmetry preserving RLS algorithm:
)]()()([)(
)()()()(
)()()()(
)()()(
)()(
)()()(
,
)(,)(
nnnTrin
nnnn
nnndn
nnrn
nnr
nnn
compute,,nFor
tionInitializa
H
H
H
πkPP
kww
uw
πk
πu
uPπ
wIP :
1
1
1
1
1
21
000
1
1
Tri{.} : only compute the upper or lower triangular part of {.},and fills in the rest of the matrix to preserve Hermitiansymmetry
12
The stalling phenomenon is directly link to the forgetting factor and the input signal variance. As we known,
Thus,
If the forgetting factor is close to one and/or the input data variance is large, the RLS may stall.
1
RΦ )(nE
1
2211 1
1
uu
nEnE
RRΦP )()()(
13
The FIR filter plays the fundamental role in digital signal processing.
How to implement an FIR is of great concern. For the ASIC solution, we have to consider– Throughput (speed)
– Gate count (area)
– Power consumption
– Delay
– modular structure For the adaptive FIR filter, we have the similar concerns. Conventionally, there are two structures for FIR filters;
namely, direct form and transposed form. If filter’s coefficients are fixed, these two forms are
equivalent.
14
Transversal and transposed form:
The equivalence of these two forms can be easily verified using the retiming technique.
15
Retiming:
The retiming technique is simple, however, it is very useful deriving new pipeline structure.
16
The disadvantage of the direct form (DF) is its low speed (needs to complete many multiplications in one clock cycle).
The transposed form (TF) can overcome this problem. However, input has to drive all weights (fan-in problem).
To solve the low speed problem in the DF, we can use the idea of pipelining (insert delays).
To solve the driving problem in TF, we can insert delays between weights.
This approach also results in a modular structure called systolic architecture.
17
Pipelined processing:
)()()(
)()()()(
nbnanx
nbnanxny
z(n))()()(
)()()(
111
nbnzny
nanxnz
18
The results:
19
W1 is obtained by inserting delays in the TF while W2 by inserting delays in the DF.
The transfer functions:
The clock rate for the structure (a) must be doubled to keep the same throughput. The structure (b) has N extra delays.
Thus, these two structures are less attractive.
1
0:2
1
0
211
)()(: For
)()(: For
N
k
kk
N
N
k
kk
zwzzXzYW
zwzzXzYW
20
The hybrid form (HF):– Pipelined structure without extra latency.
– Trades between speed and area/power consumption
21
The hybrid form I is obtained from the DF while II is from the TF (using retiming).
When the filter weights are time-varying, the TF has the weight delay problem. This will may change the behavior of the original LMS algorithm.
It is possible to “equalize” the delays in weights. In other words, w0 is not delayed, w1 is delayed by one clock cycle,…, w4 is delayed by 4 clock cycles. However, this will increase the number of delays.
Besides coefficient delay, the TF has the highest power consumption, since the number of bits in the output path (accumulation path) is usually much greater than those in the input path.
The hybrid form serves a compromise between the DF and the TF.
22
The implementation of the adaptive FIR filter also has DF, TF, and HF
The DF adaptive LMS filter:
23
The DF and HF:
24
To obtain pipelining, we have to delay filter weights, as a result, the LMS has to be modified.
This is called the delayed LMS algorithm.
The delayed LMS algorithm is a special case of delay relaxation algorithm described below.
To have an adaptive pipelined filter with higher pipeline structure, we can apply architecture/algorithm transforms
Architecture transformation:– Pipelining
– Parallelism
– retiming
)()()()( DnDnenn xww 1
25
Algorithm transformation:– Look ahead
– Relaxed look ahead
Parallelism:
26
Retiming:
27
Look-ahead:
Thus, the clock rate can be increase M times and M clocks are used to finish a multiplication (M x1/M=1).
Relaxed look-ahead: – It is derived from the look-ahead and is served as an
approximation to the original adaptive algorithm. Three kinds of relaxed look-ahead
– Delay relaxation
– Sum relaxation
– Delay transfer relaxation
)()()()( nnen-n xww 1
M-stage look-ahead
1
0
M
i
inineMnn )()()()( xww
28
Example (look-ahead):
)()()()()( nunbnxnanx 1
)()()()()()()()(
)()()()()(
1111
11112
nunbnunbnanxnana
nunbnxnanx
D
D
3D
a(n+2)
b(n+2)
u(n+2)x(n)
x(n+3)
D
D
Can be used for retiming
29
Delay relaxation:– Assume that the gradient does not have significant change
during M1 samples.
Sum relaxation:– The M terms in the gradient estimated is approximated by
M’ (M’<M) term.
1
0
M
i
inineMnn )()()()( xww
1
011
M
i
MinMineMnn )()()()( xww
1
011
M
i
MinMineMnn )()()()( xww
1
011
'
)()()()(M
i
MinMineMnn xww
30
Delay transfer relaxation:
Delay relaxation
Delay transfer
)( 1Dna
a(n-D1)
31
For example:
D
D D
D
5D
5D
d(n-5)
e(n)w(n-1)
w(n)
x(n)
32
Retiming:
D
D D
D
5D
4D
d(n-5)
e(n)w(n-1)
w(n)
D D
x(n)
33
Retiming/Pipelining:
D
D D
D
3DD
D D D D
DDD
d(n-2)
x(n)