real time implementation of a signal denoising approach based on eight-bits dwt
TRANSCRIPT
Ro
Ma
b
a
ARA
KDFLMRLE
1
iwncsaadc
rstcs
B
h
1h
Int. J. Electron. Commun. (AEÜ) 66 (2012) 937– 943
Contents lists available at SciVerse ScienceDirect
International Journal of Electronics andCommunications (AEÜ)
jou rn al h omepage: www.elsev ier .de /aeue
eal time implementation of a signal denoising approach basedn eight-bits DWT
ohamed Benouareta, Abdelhakim Sahourb, Saliha Harizea,∗
Badji-Mokhtar University – Annaba, AlgeriaKhenchella University – Khenchella, Algeria
r t i c l e i n f o
rticle history:eceived 3 March 2012ccepted 1 April 2012
eywords:WTPGAook-Up Tableulti-Resolution-Analysis
a b s t r a c t
The digital, real-time, implementation of de-noising technique based on Discrete Wavelet Transform(DWT) is of a great interest for many scientists. Indeed, the DWT provides effective representation to studythe signal that can be described by a significant number of wavelet coefficients. Due to the orthogonalityof the DWT, the noise provided has, in average, the same contribution to all obtained coefficients. Duringthe implementation phase in the FPGA target, a judicious choice of optimization algorithms must be doneon the internal logic elements, needed for the synthesis of the behavioral description of the componentsdescribing the architecture of this technique, and on the constraint of the real time in order to meetour requirement which consists of restoring the signal quality without affecting the robustness of this
eal-timeogic-elementsmbedded systems
method. Our outlook is, among other, to further optimize this algorithm by introducing the concept of“Multi-Resolution Analysis” and the appropriate structure of multipliers based on the memories knownas “Look-Up-Table” which are very efficient in terms of logic elements in order to solve the problemof information redundancy and therefore significantly simplify the calculations. Finally, to validate theresults obtained, several test benches and simulations have been elaborated through “Quartus II andModelsim” programming environments.
. Introduction
One of the fields where wavelets have been successfully applieds data analysis. In particular, it has been demonstrated that
avelets produce excellent results in signal denoising. When a sig-al is decomposed using DWT, the set of wavelet coefficients leftorrelates to the high frequency sub-bands. This high frequencyub-bands consists of the details in the data set. If these detailsre small enough, they might be omitted without substantiallyffecting the main features of the data set. Additionally, these smalletails are usually associated with noise; therefore, by setting theseoefficients to zero the noise effect is removed.
This application requires that the signal is processed at the sameate imparted to the treatment cycle, in other words, the outputamples of the denoising system should occur just after a tolerableime expressing the time dedicated to the treatment, since the cal-
ulations do not take place instantaneously. The objective of thistudy is to show that the denoising of a signal can be performed∗ Corresponding author at: Department of Electronics, University Badji Mokhtar,P 12. Sidi Ammar, Annaba, Algeria. Tel.: +213 771401352.
E-mail addresses: [email protected] (M. Benouaret),akim [email protected] (A. Sahour), [email protected] (S. Harize).
434-8411/$ – see front matter © 2012 Elsevier GmbH. All rights reserved.ttp://dx.doi.org/10.1016/j.aeue.2012.04.001
© 2012 Elsevier GmbH. All rights reserved.
efficiently and cost effectively in real time using a reconfigurablecircuit like “FPGAs” which offers the following advantages:
• Facility and flexibility of use.• Suitability for rapid prototyping.• Adaptability to dynamic reconfiguration of the circuit.• Economy and ease of manufacturing tests.
Therefore, this approach can be summarized in three stagesnamely:
1. Shrinkage of detail coefficients of the signal analyzed by choosinga decomposition level.
2. Removal of a number of detail coefficients suspected to be thecause of distortion of the useful support of the signal, in thethresholding phase.
3. Perform the inverse wavelet transform of the coefficients havingundergone effective treatment.
2. Overview
• The analyzing wavelet (db2), its corresponding filters and thedecomposition level which has been set to three are given little
938 M. Benouaret et al. / Int. J. Electron. Com
Table 1Daubechies 4-tap wavelet coefficients.
Decomposition filters Reconstruction filters
Hd (Z) Gd (Z) Hr (Z) Gr (Z)
−0.1291 −0.48301 0.48301 −0.12910.2241 0.8365 0.8365 −0.2241
•
•
•
fvbasc
3
se
0.8365 −0.2241 0.2241 0.83650.48301 −0.1291 −0.1291 −0.48301
attention regarding the requirements of the adopted techniquewithin this context.The use of an appropriate representation based on the “Look-UpTable” notion seems better suited especially when the productoperation needs to be performed in real-time.Working with a resolution of 8 bits led us to establish a limitingdevice by confining the results obtained in the range [−128, 127].Our suggestion consists of introducing, during the stage of datasynthesis, a suitable averaging filter whose role is to smooth outthe residual fluctuations from the thresholding operation.
The Daubechies wavelets use overlapping windows, so the highrequency coefficient spectrum reflects all of the high frequencyariations. This wavelet type has a balanced frequency responseut it has a non-linear phase response justifying the use of such
type of wavelet which is useful in denoising signal and is welluited for data compression applications [7]. The filters coefficientsorresponding to db2 are shown in Table 1.
. System model
Initially, the architecture designer provides the library with auite of building blocks covering the various components nec-ssary for building the four-tap Daubechies wavelet filter. The
Fig. 1. Block diagram of three leve
Fig. 2. Block diagram of the
mun. (AEÜ) 66 (2012) 937– 943
architecture designer is also responsible for updating the librarywith new blocks and providing the generator with efficient fil-ter templates corresponding to a particular architecture by usingLook-Up Table as a multipliers configuration. Due to the fact thatDWT, unlike the Discrete Cosine Transform, is not unique, the filtertemplate needs to be provided for each application with the param-eters of the DWT (type, range values, word-length, 2s complementvalues, etc.). The system is illustrated in Fig. 1.
This representation shows the three levels of DWT decomposi-tion. The pertinent information consists of the last approximationand the set of details carrying the noise which are intended toundergo an effective denoising treatment before being deliveredto the synthesis phase represented by the block diagram in Fig. 2.
• It should be noted that a range of concepts that have been consid-ered are well detailed to better understand the need to includethem in the specific structure of this architecture. These are, inparticular, the blocks of delays (�1 and �2) attached to the thresh-olding operation in the reconstruction procedure. In addition, thethresholding task must be presented in two forms soft and hardas suggested in Fig. 3. Thus, to ensure that the results obtainedduring the processing phase do not exceed the range assignedthere must be a quantizer device with underflow/overflow detec-tor which is essential for the integrative effect. The data widthof all blocks outputs in this design is the same as the inputswidth when the quantizer is placed just after the arithmeticcircuit.
�1: is the processing time of the first reconstruction phase fol-lowed by the limiting device which is located in the −128 to 127
range, plus the time allotted to the thresholding operation.�2: designates the time lost during the second phase of datareconstruction associated with the limiting device added to thetime allotted to the thresholding operation. �2 ≈ 3�1.
l decomposition using db2.
reconstruction phase.
M. Benouaret et al. / Int. J. Electron. Com
f
D
Ffv
4
mitcsvmabtcaewL
Fig. 3. Graphical representation of the two kinds of thresholding.
The two kinds of thresholding operations are expressed by theollowing equations [5].
h(�r) ={
x(n), |x(n)| ≥ �r0 otherwise
Ds(�r) ={
Dh(�r) − �r Dh(�r) > 0Dh(�r) + �r Dh(�r) < 0
0 otherwise
The architecture of the thresholding component is shown inig. 4. This representation includes, among others, three adders,our multiplexers, and three comparators in addition to an absolutealue element.
. Constant coefficient multiplier
A full multiplier admits the full range of inputs for eachultiplicand [1]. If one of the multiplicands is a constant, then
t is more convenient to establish a times array that only hashe column data corresponding to the pre-computed results ofonstant coefficient multiplier or KCMs. Note that with a con-tant multiplier, all of the LUT inputs are available for theariable multiplicand. This makes KCM more efficient than fullultiplier. Indeed, the LUT is a technique commonly used to
ccelerate numeric processing in real time applications [9]. Theasic idea is to pre-compute the result of complex operationshat can be expressed as a function of an integer value. The pre-omputed results are typically stored in an array which is used
t a run-time rather than performing some calculations that lookxpensive and take longer. Hence, during the filtering operationhether in the analysis or in the synthesis phase the use of aUT is of vital importance to describe the multiplier concept.
Fig. 4. VHDL structural design o
mun. (AEÜ) 66 (2012) 937– 943 939
It is a powerful technique for reducing the size of a parallel hard-ware multiply-accumulate that is well suited for FPGA designs.
We can easily see, in Fig. 5, the reduced structure of a LUT con-taining (2�+1) operands, each one take (2� − �) bits to produce aresult with 2� bits. hi denotes the different coefficients that theDaubechies wavelet can take. This approach consists of subdivid-ing the input data into two parts (low and high order) to refer tothe memory location where the result of the multiplication of thedata and the normalized coefficient multiplied by 2(2�) is stored. Itis important to note that an offset is used in order to handle nega-tive data in the range [−128, −1]. The offset is determined by thefollowing equation.
[16 × 256 + (m − 16) × hi] with m = 8, . . . , 15
5. One level decomposition
Before starting the construction of a suitable system dedicatedto the decomposition of the input data signal into two parts charac-terizing the approximations and details we are quickly faced withtwo major problems, namely, the causality of the filter and the dec-imator which must be put just after this filter [2,4]. The causalityof the filter can be achieved by adopting a sliding window that actsas a buffer which size corresponds to the number of coefficients ofthe filter (4 in our case) to avoid the edge effect. Whereas, the deci-mator can be modeled by a state machine which purpose is to takea sample among two at a rate of Fs/2. This leads to the design ofa special component called “splitter” as shown in Fig. 6 and whichundertakes to perform these two tasks [8,10].
The component “splitter” not only provides the requested dataat the filter inputs but also acts as a frequency divider by two of themaster-clock in order to carefully perform the decimation opera-tion as we can see easily on the diagram of Fig. 7.
Obviously, to meet this requirement, we must choose the fallingedge to calculate the sum of the four operations resulting from coef-ficient’s blocks. Hence, an overflow/underflow detector should beprovided after each arithmetic calculation to be consistent with theconstraint of the implementation (data size). Note that the sameprinciple applies to the set of the organs constituting the generalframework of this application whether in the analysis phase or in
the synthesis phase [3,6].The timing diagram of Fig. 8 shows, among others, the datainput sequence to be applied to the input of this system and itsresponse consisting of the approximations and details of such a
f the thresholding circuit.
940 M. Benouaret et al. / Int. J. Electron. Commun. (AEÜ) 66 (2012) 937– 943
Fig. 5. The LUT architecture implementation using KCM concept with � = 4.
Fig. 6. One level decomposition architecture.
M. Benouaret et al. / Int. J. Electron. Commun. (AEÜ) 66 (2012) 937– 943 941
Fig. 7. Behavioral architecture of lower decomposition scheme.
results of one-level decomposition.
la
T
6
tbTsc
p
7
alsf
1
Fig. 8. Quartus®II simulation
evel decomposition. It can be seen, also, that the response occursfter a significant delay of 4 clock periods.
The values listed in Table 2 are the results of a Matlab simulation.hey clearly corroborate the results obtained by our approach.
. The interpolator device
The function of this device consists of inserting a zero betweenwo consecutive samples which can be qualified as an “up-samplinglock”. The basic structure of the interpolator is illustrated in Fig. 9.he architecture includes a frequency doubler associated with ahift register loaded from a multiplexor at the rate of the initiallock.
The simulation results showing the correct behavior of this com-onent are clearly seen on the timing of Fig. 10.
. Results and discussions
Our main concern is to present the implementation task of ourlgorithm in an explicit, concise and effective manner despite theimitations from which it suffers. So, during the development phase,ome precautions were taken to achieve our goal and it seems use-ul to remember some important considerations:
. Provide a mean filter whose principal mission is to improve thequality of the restored signal and to select an appropriate size
to meet our expectations. Indeed, when each new data item isreceived, the output is the average of the four most recentlyreceived data. If we had an input stream of data (x) indexed bytime (e.g. xin is the value of xin at time k) and we received a newFig. 9. Interpolator architecture implementation.
942 M. Benouaret et al. / Int. J. Electron. Commun. (AEÜ) 66 (2012) 937– 943
Table 2Matlab results for one level decomposition.
Data sequence Real-values Fixed-point values
Approximation: A1 Detail: D1 A1 D1
15 −1.9411 −7.2444 −1 −725 5.4692 −29.7796 5 −2998 138.5057 140.6995 127 127
125 −59.9604 −40.3064 −59 −40
2
3
−127 −53.7891
−75 0.0000
−21 0.0000
piece of data in each clock cycle, the equation for the outputwould be:
yout(k) = [xin(k) + xin(k − 1) + xin(k − 2) + xin(k − 3)]4
To create a system-level block-diagram of a filter (Fig. 11), weneed to analyze the different components constituting this sys-tem to see the best way to implement it according to our needs,taking into account the processing time optimization and thenumber of logic elements assigned.
. Find an adaptive threshold based on the error between the orig-inal and the restored signal to effectively neutralize the harmful
effect of noise carried by the signal.. Choose the decomposition level depending on the signal to beprocessed. In our case it is limited to three in order to test theefficiency of the methodology adopted in this application.
Fig. 10. The chronogram of t
Fig. 11. The top-level structure o
14.4127 −53 140.0000 0 00.0000 0 0
4. To increase the degree of severity with respect to this approach,it is convenient to select a support component characterized bythe combination of three different segments corrupted by a noisecaused by unwanted variations (Fig. 12).
(D) designates the delay required for the processing task and (T)expresses the slowness that the moving average takes to performits mission along with the threshold excursion ranging from 15 to25.
From this illustration two critical remarks follow:
• If the clock rate is set at 0.1 �s (10 MHz), the processing task willbe achieved in 13 �s which suggests the possibility to extend the
size of data to further improve the quality of the received signal.• In fact, only the details of levels 2 and 3 were submitted tothe thresholding operation because they are thought to be per-fectly decorrelated from the useful signal. However, undesirable
he interpolator device.
f moving averaging filter.
M. Benouaret et al. / Int. J. Electron. Commun. (AEÜ) 66 (2012) 937– 943 943
Table 3Hardware resource consumption.
Used Available Percentage of use
Total logic elements 821 33 216 821/33 216 = 2%Total combinational functions 802 33 216 802/33 216 = 2%Dedicated logic registers 148 33 216 148/33 216<1%Embedded multiplier 9-bit elements 0 70 0/70 = 0%Total PLLs 1 4 1/4 = 25%
F yed ses
m2l
8
TaFhbeaadmsn(ae
ig. 12. Simulation results using modelsim tools; (a) original data stream; (b) delaequence of restored signal.
residues carried by the data information remain, which justifiesthe use of an appropriate filter.
The signal denoising architecture has been synthesized usingodelsim, QuartusII tools and the board DE2 targeting Cyclone®II
C35Altera FPGA device, EP1C6Q240. The device contains 33 216ogic elements. The synthesis report is presented in Table 3.
. Conclusion
A denoising framework for an FPGA-based Discrete Waveletransforms system has been presented. The methodology allows
signal processing application developer to generate a suitablePGA configuration which is mainly used for DWT method at aigh level instead of spending considerable time designing anduilding at a gate and routing level. Thus, the end-user will ben-fit from the high performances of FPGA devices while designing at
high level with tools he is familiar with. The preliminary resultsre very promising; however, extensive further work needs to beone towards the extension of the system to handle different arith-etic representation using Look-Up Tables to speed convergence,
ave considerable time and reduce the number of logic elements
eeded for such an application taking into account the data size16 bits) with a different wavelet analysis and synthesis schemeslong with different architectures. This contribution will certainlyncourage the researcher to follow with confidence, the path[
quence of original signal; (c) restored signal using denosing algorithm; (d) filtered
leading to tangible applications that involve the wavelet notion andits implementation on an FPGA target.
References
[1] Sahour A, Benouaret M. FPGA implementation of Daubeshies polyphase-decimator filter. Int J Comput Appl (0975 – 8887) 2010;7(October (10)).
[2] Pang J, Chauhan S. FPGA design of speech compression by using discrete wavelettransform. In: Proceedings of the world congress on engineering and computerscience 2008 WCECS 2008. 2008.
[3] Chen PY. VLSI implementation for one-dimensional multilevel lifting-basedwavelet transform. IEEE Trans Comput 2004;53(April (4)):386–98.
[4] Fan W, Gao Y. FPGA design of fast lifting wavelet transform. In: 2008 congresson image and signal processing (CISP’08), vol. 4. 2008. p. 362–5.
[5] Donoho DL. De-noising by soft-thresholding. IEEE Trans Inf Theory1995;41:612–27.
[6] Chilo J, Lindblad T. Hardware implementation of 1D wavelet transform on anFPGA for infrasound signal classification. IEEE Trans Nucl Sci 2008;55(1 (Part1)):9–13.
[7] Zhuang S, Carlsson J, Li W, Palmkvist K, Wanhammar L. GALS based approachto the implementation of the DWT filter bank. In: Proc. 7th international conf.on signal processing. 2004. p. 567–70.
[8] Uzun IS. Rapid prototyping–Framework for FPGA-based discrete biorthogonalwavelet transforms implementation. IEE Proceedings Vision, Image and SignalProcessing 2006;153(December (6)):721–34.
[9] Nagabushanam M. Design and implementation of parallel and pipelined dis-tributive arithmetic based discrete wavelet transform IP core. Eur J Sci Res
2009;35(3):378–92. ISSN 1450-216X.10] Carletta J, Giakos G, Patnekar N, Fraiwan L, Krach F. Design of a fieldprogrammable gate array-based platform for real-time denoising of opti-cal imaging signals using wavelet transforms. Measurement 2004;36:289–96.