perceptual time-frequency subtraction algorithm for noise reduction in hearing aids

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 48, NO. 9, SEPTEMBER 2001 979

Perceptual Time-Frequency Subtraction Algorithmfor Noise Reduction in Hearing AidsMin Li, Hector G. McAllister*, Norman D. Black, and Trina Adrián De Pérez

Abstract—Sensorineural hearing disorders are a major anduniversal community health problem. In many cases, hearing aidsoffer the only solution for people suffering from such disorders.Unfortunately existing aids do not provide any improvement inintelligibility of the signal when background noise is present. Ahearing aid system should ideally simulate auditory processesincluding those aspects of the speech signal that are perceptuallyimportant. This work presents a new integrated approach to thedesign of a digital hearing aid, based on a wavelet transform,as well as a formulation of the temporal and spectral psychoa-coustic model of masking. Within the model, the PerceptualTime-Frequency Subtraction (PTFS) algorithm is developed tosimulate the masking phenomena and reduce noise in single-inputsystems. Results show that the use of the PTFS yields a significantimprovement in speech quality especially in unvoiced portions.Additionally, the noise component during periods of silence hasbeen attenuated by up to 20 dB. This new noise reduction methodis expected to be applicable in a variety of applications, includingdigital hearing aids and portable communication systems (e.g.,cellular telephones).

Index Terms—Auditory filtering, auditory masking, digitalhearing aids, hearing loss, noise reduction, signal-to-noise ratio,spectral subtraction, wavelets.

I. INTRODUCTION

H EARING disorders are a major community healthproblem throughout the world [1]. Despite the con-

siderable body of research carried out in the area of hearingimpairment many problems still exist because of a lack of fullunderstanding of the mechanisms which are affected as a resultof the impairment. There are many causes of hearing loss, forexample, it can be congenital, caused by disease, as a resultof exposure to loud noise or even as a result of a trauma. Formost people however hearing loss is simply age related. Themain types of hearing loss are conductive, sensorineural andmixed. Conductive hearing loss is due to an abnormality inthe outer or middle ear, that obstructs the passage of soundwaves to the inner ear. This can be corrected by medical orsurgical treatment, or by simple amplification through a hearing

Manuscript received May 6, 1999; revised May 15, 2001.Asterisk indicatescorresponding author.

M. Li is with the Northern Ireland Technology and Engineering CenterNITEC, NORTEL Networks, Antrim BT36 6XA, Northern Ireland.

*H. G. McAllister is with the Faculty of Informatics and Northern IrelandBioengineering Center, NIBEC, University of Ulster at Jordanstown, AntrimBT37 0QB, Northern Ireland (e-mail: [email protected]).

N. D. Black is with the Faculty of Informatics and Northern Ireland Bio-engineering Center, NIBEC, University of Ulster at Jordanstown, Antrim BT370QB, Northern Ireland.

T. A. De Pérez is with the Departamento de Electronica, Signal ProcessingGroup, Universidad Simón Bolívar, Caracas 1080, Venezuela.

Publisher Item Identifier S 0018-9294(01)07440-7.

aid. If a patient has damage to the inner ear sensory deafnesswill result and similarly, if the auditory nerve is impaired,a neural hearing loss occurs. A sophisticated hearing test isrequired to distinguish between these two groups and theyare usually combined together under the term sensorineuralhearing loss. A patient with a mixed impairment will exhibitboth a nerve loss and a conductive loss. The characteristicsof the hearing-impaired population who are candidates foramplification has changed substantially over the past 30 years.The Medical Research Council commissioned a study [2] todocument the prevalence, causes, description, understandingand consequences of hearing disorders in the United KingdomThe results of this study indicate that the majority (4 : 1) of im-pairments originate in a sensorineural disorder when the betterear is considered. Overall, considering both ears, there are twiceas many sensorineural disorders as purely conductive disorders.No surgery can be performed to help the vast majority of peoplewith sensorineural hearing loss due to cochlear pathology.Therefore, the only choice available for these hearing-impairedindividuals at this time is to wear a hearing aid. Unfortunatelyexisting aids do not provide any improvement in intelligibilityof the signal when background noise is present. In this paper, anew algorithm, perceptual time-frequency subtraction (PTFS),for noise reduction in single-input hearing aid systems ispresented. It is a subtraction scheme in wavelet domain wherethe properties of the masking phenomenon are incorporatedto obtain a nonlinear weighting function. Section II describesthe anatomy and physiology of the auditory system since anyhearing aid system must simulate auditory processes, includingthe aspects of the speech signal that are perceptually important.A functional model of the forward and inverse auditory systemis presented in Section III. In Section IV, the PTFS algorithm isdescribed within the context of a complete hearing aid scheme.The PTFS algorithm performance is compared with threeother standard noise reduction methods; methodology andexperiments are in Section V. Conclusions and perspectives aregiven in Section VI.

II. A NATOMY AND PHYSIOLOGY OF THEAUDITORY SYSTEM

Two phenomena, hearing and sound perception, are involvedin how the sound signal entering the listener’s ear is convertedinto a linguistic message. The auditory system can be subdi-vided into the outer, middle, and inner ear. The outer ear collectssound waves and directs them to the ear canal. The middle earconverts sound energy received into mechanical energy trans-mitted to the inner ear. The inner ear consists of a liquid-filledtube called the cochlea. Two membranes divide the interior of

0018–9294/01$10.00 © 2001 IEEE

980 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 48, NO. 9, SEPTEMBER 2001

the cochlea: Reissner’s membrane and the basilar membrane(BM). On the BM lies the organ of Corti, which contains about30 000 sensory hair cells. The endings of the auditory nerve ter-minate on these hair cells causing neural firings (electrical po-tentials) to propagate in the auditory nerve. Since the BM variesgradually in shape and stiffness along its length, its frequencyresponse varies accordingly. Frequency resolution along the BMis best at low-frequencies. Whether a sound can be heard de-pends on its intensity and spectrum, and perceptibility is dis-cussed in terms of hearing thresholds. The cochlear processesof BM vibration and neural firings are highly nonlinear, withthe result that perception of sound energy at one frequency isdependent on the distribution of sound energy at other frequen-cies, as well as on the time course of energy before and after thesound: this is the phenomenon of masking, in which the percep-tion of one sound is obscured by the presence of another. Si-multaneous sounds causefrequency masking, where a low-fre-quency sound generally masks a higher-frequency one. Soundsdelayed with respect to one another can causetemporal maskingof one or both sounds.Frequency selectivityrefers to the abilityof the auditory system to separate or resolve (to a limited ex-tent) the components in a complex sound. Indications are thatthe sensitivity and frequency selectivity shown by the wholeorganizm are derived from that of the BM [3]. The auditorysystem can be considered as a bank of bandpass filters. Fre-quency resolution of the ear’s filtering mechanisms is known ascritical bands. The widths of the critical bands increase with in-creasing center frequency. Thefrequency resolutioncan be mea-sured in terms of the minimum frequency separation at whichtwo tones can be distinguished. Experiments have been car-ried out to show that the multi-independent channel articula-tion band model approximately corresponds to the critical bandbehavior of the BM [4]. Typically, a sensorineural loss alsogives better hearing for the low-frequencies than for the high-frequencies. Cochlear damage also effects the frequency selec-tivity (broader than normal), and the rejection of backgroundnoise will be much less effective. The temporal and spectralmodels of masking will be considered in the next section to de-velop a new algorithm to reduce noise in hearing aid designs.

III. A NEW WAVELET-BASED FORWARD AND INVERSE

AUDITORY MODEL

A good auditory model should incorporate the physiologicalcomponents of the human auditory system. Experiments frompsychoacoustic tests have revealed that the peripheral auditorysystem behaves as a bank of overlapping bandpass filters havingrelatively constant Q values. The concept of the auditory filteris closely linked to the behavior of the BM in the inner ear. Thisfunction of auditory filtering has been modeled by a number ofresearchers over the past 20 years using traditional filterbanks[5]. The short-time fast Fourier transform (FFT) is used to trans-form the signal into the frequency domain before a weightingfunction is applied to the FFT coefficients. However, the FFTmethod assumes the signal to be stationary and is, therefore, in-sensitive to its varying features. Wavelet transforms, offering thecapability of producing a good frequency resolution at low-fre-quency and good time resolution at high-frequency (similar to

human audition) has recently found application in models of theauditory periphery. The nature of the current research suggeststhat the wavelet approach to the modeling of the BM would bepreferable for hearing aid design [6], [7].

Chabries [8] has constructed a normal auditory model thatincludes several functions which are basic to the human auditorysystem including the following:

1) general bandpass filters, which provide the filtering per-formed by the BM;

2) envelope detectors, which estimate the intensity of thesignal from each of the bandpass filters as detected by thehair cells, followed by a nonlinear intensity gain whichlogarithmically compresses the signal;

3) high pass filters, which provide loudness adaptation, re-quired to derive the instantaneous intensity from the in-tensity of the envelope of the acoustic pressure field;

4) hyperbolic tangents, which functionally mimic the firingof the inner hair cells and the associated neural network;

5) multiplicative intrinsic noise sources, which are additivein the logarithmic domain, and exponentiators and detec-tors that, in a simple way, mimic the function of loudnessdetection in the brain.

However, although the Chabries’ model exploits the propertyof the critical bands, it performs subband processing of signalswith a fixed time resolution and omits the masking property ofthe normal auditory system.

We propose a new model adopting the basic structure ofChabries’s model but replace time-fixed bandpass filteringwith wavelet transforms in order to mimic the time-frequencyanalysis of the critical bands and to produce the envelopeof the signal. The resolution for the bandwidths of waveletfilters can be designed to be similar to the critical bands inthe auditory system. On the other hand, nonlinear propertiessuch as masking and suppression may not be represented whenusing linear system analysis and the concept of the cochlea asa frequency analyzer. For such systems, time-frequency rep-resentations are also highly desirable. Therefore, the maskingsuppression effect is integrated into the proposed model fornoise reduction.

Creation of a model which provides the characteristics of theauditory tract enables the modeling of hearing for acousticallyimpaired subjects. The model transforms an input sound signalinto a meta space related to normal hearing perception and canbe combined with an equivalent inverse model of a damaged au-ditory system (which transforms from the normal hearing metaspace back into a prewarped sound pressure space) to enablenear normal hearing perception for a hearing impaired ear.

An acoustic signal may be represented as

(1)

where is a slowly-varying positive-valued envelope whoseintensity determines the neural firing rate and is a rapidlyvarying vibration providing the encoding of the frequency in-formation.

In the proposed model of the auditory system, the wavelettransform maps the envelope and encodes informationabout though the tonotopic organization of the wavelet

LI et al.: PTFS ALGORITHM FOR NOISE REDUCTION IN HEARING AIDS 981

Fig. 1. A complete hearing aid scheme.

filters. These are believed to be the features used by the auditorycortex for sound perception. The hypothesis which will guidethe analysis of hearing compensation is the assumption thatthe eighth nerve is normal and, therefore, the neural firingpatterns and information provided to the eighth nerve from thecompensated-degraded ear mechanism ought to be the same asfrom the normal ear mechanism. Hence, an inverse model isdeveloped which will be used to transform the normal hearingperceptual space signal at the outputs of the hyperbolic tangentsubsystem of the model, referred to hereafter as a perceptualspace, to the input acoustic stimulus required to produce asimilar perceptual space response for the damaged ear. Math-ematically, the human auditory system model, which operateson an acoustic input stimulus and transforms the signal intothe perceptual space, is characterized by the operator fornormal hearing and for damaged hearing.

The normal hearing perceptual space signal is given as

(2)

where is the acoustic stimulus and is the perceptualspace model subsystem output. The damaged ear equivalent per-ceptual space quantities are represented with accented variablesand, hence, is replaced with giving

(3)

One may express the inverse transformation required to trans-form the normal perceptual space intensity into an input stim-ulus, prewarped by the inverse operations, for the damaged earas

(4)

The damaged ear operating on this prewarped acoustic stim-ulus provides the perceptual space output

(5)

Under the hypothesis that sufficiently mimics the oper-ation of the human auditory system, the hearing impaired indi-vidual listening to the prewarped signal, , provides the finalauditory system operation in the foregoing equation withtheir own ear. Therefore, the task required here is to provide an

such that the damaged ear will perceive . The transfor-mation required is

(6)

The set of transformations suggested byis specifically defined as a series

of subsystems and operations to allow a reduction in thecomplexity of the required processing. A single orthogonalset of wavelet bandpass filters is chosen as the filterbank inthe initial portion of the human auditory model. With wavelettransforms these two functions can be incorporated into anintegrated time-frequency domain system. It can be deducedthat the combination of a forward model of a normal auditorysystem and an inverse model of a hearing impaired auditorysystem (in other words, a hearing aid) reduces to a set ofwavelet bandpass filters, each of which is followed by a noisesuppression mechanism and a homomorphic automatic gaincontrol (AGC). The signal can be processed separately ineach of the subbands, at a low data rate, and then inversewavelet filters used to return it back to the time domain. Acomplete diagram, which includes all channels of the hearingcompensation system, is shown in Fig. 1.

Within this scheme, we will present a new PTFS algorithm.The wavelet transform over can be used to replicate thenormal auditory system and resolve frequency and temporallocalization of speech spectral components for sensorineuralhearing impairments. The auditory masking of noise can alsobe modeled by thresholding wavelet transform coefficients withrelevant noise strength to remove noise dynamically as a single-input method.


IV. THE PERCEPTUAL TIME-FREQUENCYSUBTRACTION

ALGORITHM

Rejection of environmental noise is one of the major prob-lems to be addressed by hearing aids. Such noise has powerspectral energy that is often predominantly low-frequencyweighted and may mask out the relatively important, butweaker, high-frequency components of speech, causing lossof speech intelligibility. In particular, for subjects with recruit-ment of loudness, volume settings that compensate for elevatedthresholds amplify both the signal and the background noise toabove hearing threshold and signal masking levels [9]. Most ofthe existing commercial hearing aids are monaural aids with asingle-microphone input to receive the signal. Noise reductionin single-input systems is a difficult problem when comparedwith systems with multiple inputs. The results of numerousnoise reduction studies led authors of several literature reviews[10], [11] to conclude that existing single-microphone noisereduction methods were incapable of providing consistentimprovements in intelligibility, because the existing algorithmsfor single-input systems exploit only the differences in thespectrum to separate speech from noise.

In this paper, we consider a single-microphone input system.The one-microphone input source is generally assumedto contain speech , plus additive noise . The wavelettransform over can be used to replicate the normal audi-tory system and resolve frequency and temporal localization ofspeech spectral components for sensorineural hearing impair-ments. The auditory masking of noise can also be modeled bythresholding wavelet transform coefficients with relevant noisestrength to remove noise dynamically as a single-input method.In this section, a new wavelet-based approach for reducing whiteGaussian noise is developed for single-microphone aids. It willbe shown that better results can be obtained by exploiting psy-choacoustic features such as critical band filtering and auditorymasking.

Spectral subtraction is widely use to suppress additive noisein hearing aids. Assuming the addition signal model for the re-ceived noisy signal, in spectral subtraction the modulus of thespeech spectrum is estimated by subtracting an estimateof the noise magnitude spectrum, , from the mod-ulus of the input source spectrum . The noise spectrumis estimated during quiet periods or during pauses in the desiredsignal. Therefore, one of the fundamental assumptions in thisprocessing is that the estimated value of the noise magnitudespectrum prior to speech activity is assumed not to change sig-nificantly when speech is present. Furthermore, due to the dif-ficulties in estimating the random phase of the noise, it is as-sumed that by removing the magnitude spectrum of noise fromthe magnitude of the noisy signal and combining it with thephase of the noisy speech it is possible to reconstruct the desiredsignal faithfully. This argument is consistent with some of thehuman auditory perceptual features, since it has been demon-strated [12] that speech perception is determined primarily bythe magnitude spectrum of the signal and is relatively insensitiveto the phase of the signal. As reported by Vaseghi [13], signifi-cant noise reduction can be achieved by this particular filteringscheme. However, some residual noise is also observed in the

processed speech. One of the main reasons for the presence ofthis residual noise is the spectral error resulting from the differ-ence between the actual noise spectrum in each frame and thespectrum estimated during quiet periods. The amplitude of theerror produces a musical sounding noise in the processed speechand this residual noise is referred to as musical noise. The au-dible musical noise will be produced mainly in the frequencyregions where the speech is very weak (masking phenomenon).If the noisy speech within these regions is discarded, there isno effect on the overall speech quality. However, the Fouriertransform applies uniform weighting to all the frequency com-ponents and as a result the errors are given equal weighting inresynthesis. These observations motivated the development ofan alternative subtraction algorithm in the perceptual domain.

The proposed method initially transforms the magnitudespectrum of the noisy speech, , into auditory excitationpatterns. Excitation patterns represent the activity produced bythe signal at a particular location in the BM. The patterns areobtained using the wavelet transform. If is the auditoryfilter with center frequency , the excitation pattern atfrequency , is given by

(7)

The proposed algorithm is a subtraction of noise from noisyspeech in the wavelet domain and incorporates the perceptualfeatures. It is, therefore, suitably referred to as PTFS. The newalgorithm has a similar form to conventional spectral subtractionwhich operates in the spectral domain, and is also related to thepapers by Donoho [14] which discuss “wavelet denoising” tofilter Gaussian noise; this technique computes a wavelet trans-form of the noisy data samples to obtain the discrete transformcoefficients . A hard or soft thresholding method is then ap-plied to the resulting transform coefficients .

In the time-domain, the additive noise model also pro-duces an additive noise model for the wavelet coefficients

(8)

where is the wavelet transform of the clean signal ,and is the wavelet transform of the white noisewith standard deviation . The wavelet coefficients of noise

have zero mean and varianceif the wavelet transformis orthogonal [15]. The variance is conserved and white noiseis spread out over all the transform plane, yielding a sparsityof zero. Every wavelet coefficient, therefore, contributes noiseof variance , but only a very few wavelet coefficients con-tribute to the signal. These facts will be used to help developalgorithms for the detection and estimation of signals usingwavelets. Assuming that the white noise variance is known, thediscrete wavelet transform of the received information istaken to obtain . The problem is to estimatean unknown deterministic sequence fromobservations . The filtering formulation inconventional spectral subtraction is adopted here to calculatethe speech spectrum as

(9)


Fig. 2. Weighting functions applied by PTFS showing the comparison with conventional spectral subtraction.

where is the zero-phase magnitude response of the filter,and thus a weighting function that is computed according tothe signal-to-noise ratio (SNR) in each critical band (auditoryfilter). It is postulated that the auditory system perceives soundsbased on the SNR [or signal-to-masker ratio (SMR)] in eachcritical band. This concept is modeled to perform in the waveletdomain as

(10)

where is some function of . It is noted that in conventionalspectral subtraction the estimated filter is based on the SNR ineach frequency band and is given by

(11)

In the proposed scheme, is estimated by consideringthe linear model in (8). According to this assumption the exci-tation produced by the noisy signal is a linear sum ofthe excitations produced by the signal and noise alone. Conse-quently the weighting function is chosen to make pro-portional to the excitation produced by the signal alone. Thusthe proposed function, , in (10) is given by

(12)

wherewavelet coefficients of the noisy signal;constant;standard deviation.

The filtering approach is convenient for including the non-linear properties of auditory masking. In particular, it is seenthat can incorporate this masking behavior and pro-duce a nonlinear transfer function as shown in Fig. 2. The func-tion is divided into the three regions shown: noise-masking,

signal-noise, and signal-masking regions. For the purposes ofcomparison, Fig. 2 also illustrates the transfer function of con-ventional spectral subtraction: the difference between the twofunctions is clear.

Hard thresholding only has two values for the weighting func-tion: zero if SNR is lower than and one other-wise. The soft thresholding looks similar to the conventionalspectral subtraction except if the SNR is lower than . Asdiscussed previously, when two signals are presented togetherwithin a critical band and the excitation produced by the weakersignal is not sufficiently large, the excitation pattern of that audi-tory filter will be dominated by the strong signal. There are twoextreme situations shown in Fig. 2 that are of special interest.In the first case, the signal is so strong that the noise is ren-dered inaudible, which happens in the signal-masking region.In the second case, the noise is sufficiently strong to make thesignal inaudible, which appears in the noise-masking region.Therefore, as far as the desired signal is concerned, above acertain SNR, (signal masking threshold), (case 1—signalis dominant), the auditory system applies a unity weighting,while below a certain SNR, (noise-masking threshold),(case 2—noise is dominant), the auditory system gives a zeroweighting. This nonlinear phenomenon (suppression) is easilyincorporated into the filter function In the signal-noise region,both signal and noise are audible and normal subtraction is ef-fective to remove noise from the signal. From (12), it can beseen that the new algorithm is the generalization of Donoho’swavelet denoising: when it is hard thresholding,while when it is soft. The hard thresholding ofDonoho’s algorithm suffers from a serious problem in that itproduces an oscillation spike residual noise, since the noise inthe signal-noise region is not removed and is audible. The softthresholding improves significantly on this residual noise, how-ever the important high-frequency speech in the signal-maskingregion is removed when the subtraction takes place unneces-sarily in this region. The PTFS identifies such regions, subtracts


noise in the signal-noise region and leaves the signal-maskingregion intact, since the noise is inaudible and has perceptuallyno effect on speech. As a result, the PTFS is an optimizationof the wavelet subtraction including both hard and soft thresh-olding methods.

Determining the thresholds and requires an esti-mate of the subjective loudness of both signal and noise to quan-tify the masking thresholds. Unfortunately, there are no methodsavailable at present to perform this particular task successfully.However, it is possible to perform some objective and subjectivetesting to determine the thresholds. Alternatively, the thresholdscan be estimated approximately from the data presented with re-spect to SMR [12]. It has been shown that if the stronger signalis 2.5 times higher than the weaker one (i.e., 4 dB above), thesecond signal will not be detected. This corresponds to valuesof the noisy signal, and .It is not possible to justify this choice of value analytically butas will be demonstrated later, the estimates do not introduce anynoticeable distortion in the processed speech. For convenienceof discussion, the time domain thresholds and areused for the following description instead of and .

It can be seen that the above processing is primarily based onthe auditory perceptual features. Although each step in this pro-cessing is justified by objective measures and psychoacousticdata, it is important to show that this method could reduce theamount of musical noise produced by the conventional sub-traction method. It can be seen that wavelet-based algorithmsare not required to estimate the subband noise spectrum, sincewavelet transforms preserve the deviation and properties ofwhite noise and the signal and noise have different waveletsparsities. These wavelet properties make it easy to estimatethe signal from noise. Furthermore, as has been pointed out,contributions to the residual error come mainly from lowSNR regions. However, the proposed nonlinear weightingscheme eliminates this error by setting the weights in theseregions to zero. It can be argued that with such low SNRs thewavelet coefficients provide very little information about thespeech. Therefore, when residual noise is present, the speechin low SNR regions does not contribute significantly towardintelligibility, while the residual error causes the musical noise.Hence, suppressing this information has the effect of reducingthe musical noise without decreasing the intelligibility of thesignal any further.

V. METHODOLOGY AND EXPERIMENTS

The performance of the proposed algorithm was investigatedwith synthetic and real speech signals, and the results werecompared with those obtained using Donoho’s wavelet de-noising algorithms and those obtained using the conventionalspectral subtraction scheme. The effect of wavelet propertieson the noise reduction was examined. Original signals forisolated words were obtained from a database of continuousspeech, recorded from a radio news bulletin and sampled at8 kHz. Some stationary random noise was digitally added togenerate the noisy signal. Seven input SNRs (from10 to 20dB) have been used to simulate different noise conditions. Thenoise level was estimated from a 100 ms segment of the noise

before the start of the speech. Due to the short-time stationarityproperties of speech, signals were processed in short blocks anda block length of 8 ms was used. All the algorithms consideredhave been evaluated using objective measures and informallistening tests and comparisons are based on the same test dataand measures.

It is well known that the traditionally SNR measure has littleperceptual meaning and is thus not a good measure of a hearingaid performance. In this paper, the segmental SNR measure hasbeen used to provide an objective indication of the performanceof the proposed algorithm. The segmental SNR is measured bycomputing a SNR measure for each speech frame and averagingthese measures over the entire signal. It is more reliable than asingle global SNR computed for a whole speech signal.

In terms of objective measures, the segmental gain in the SNRis evaluated and a segmental distortion is computed from theclean signal . To this end, each of the four algorithms isexpressed as noisy speech filtering. This filtering is applied tothe noisy speech signal to obtain the filtered signal .The segmental gain and the distortion are computed by

(13)

where

(14)

(15)

where is the number of signal samples in a frame andis the number of frames. Gain represents the improvement inSNR produced by the signal estimate while the distortionsometimes reflects the comfort or nature of the processed signal.

In the first part of the experiment, some white noise wasadded to the original speech to generate the noisy signal. Thenoisy speech was processed using the proposed filtering as de-fined in (12). The noise-masking threshold was fixed andrelated to Donoho’s threshold , correspondingto in (12), while the signal masking threshold

ranges from to 2 2.5 3 1020 50 100 . As discussed above, Donoho’s de-noising schemes are two special cases of the proposed filtering,that is hard thresholding with and soft thresh-olding with since the signal is significantlyless than 10 . Therefore, this testing can not only find theoptimal filter (thresholds for noise masking and signal masking)


but also produce results to compare the optimal filtering withboth the hard and soft thresholding schemes of Donoho.

The PTFS algorithm was used to process the same noisysignal for all the different . The performance of the threeschemes was compared at different input SNRs. As shown inFigs. 3 and 4, the quality of the output speech was seen to beaffected by the signal masking threshold. The value ofdid make significant difference in the speech quality. Thehard thresholding method obtains the highestnoise reduction gain, however the distortion measure (notillustrated in Fig. 4 because it is out of scale) are the highestand much larger than other settings:is higher than 500 forSNR 10 dB and is under 100 only for SNR 2 dB.The processed signal exhibits many small spikes (spuriousoscillations), and from informal listening tests these spuriousoscillations sound as a clicking noise. On the other hand, it canbe seen that the soft thresholding method has the lowest noisereduction gain since it removes useful information of the signalin addition to the noise in the signal-masking region.

It is found that when 2.5 the SNR gain is thehighest below 0 dB and the third highest above 0 dB, while thedistortion is the smallest. The processed signal with2.5 is given in Fig. 5. There was no audible distortionor decrease in the quality of the processed speech. From in-formal listening tests it was concluded that the perceptual al-gorithm consistently reduced the spike noise and improved thequality of the speech compared with Donoho’s schemes. A sim-ilar degradation in performance was observed with the optimalperceptual scheme. In particular, some residual noise was pro-duced by the perceptual algorithm in very low SNRs (below5dB). This degradation was observed since, in most frequency re-gions, the signal is not strong enough to mask the residual noisesuccessfully. However the amount of residual noise can be re-duced further by increasing the noise-masking thresholdand thereby subtracting an over-estimate of the noise spectrum.Although it will decrease the noise, the output speech will be oflower quality, since the low-energy regions of speech will alsoremoved.

As discussed above, the quality of the output speech wasalso seen to be affected by the noise-masking threshold. As thevalue of is increased a consistent degradation in the speechquality was noticed. affects all the components with lowSNRs. Subsequently it removes the noise as well as the speechat those frequencies. Even though the speech was more dis-torted, the system resulted in significant noise reduction. As thevalue of is decreased, the noise becomes increasingly au-dible. The algorithm was tested with different values forand a good compromise between speech quality and noise re-duction was achieved in agreement with Donoho’s threshold,

.The excitation patterns of the signal are very important in esti-

mating the optimal filter in the perceptual scheme. They are de-termined by the bandwidth and properties of the auditory filter.The effect of auditory filter bandwidth on the quality of the pro-cessed speech was investigated. A clear loss in the quality ofspeech was observed as the bandwidth of the filter was increased(the number of frequency bands decrease). This behavior canbe expected, since increasing the bandwidth will result in a cor-

Fig. 3. Noise-reduction gain of PTFS with variable threshold C.

Fig. 4. Distortion measure of PTFS with variable threshold C.

Fig. 5. Wavelet reconstruction with the optimal PTFS. (a) Original speech.(b)–(d) Speech plus noise at differents SNRs. (c)–(e) Reconstruction.

responding loss in the frequency resolution, and consequentlythere will be some smearing of frequencies. However, despite


Fig. 6. Comparison of four noise reduction algorithms: Hard thresholding, soft thresholding, spectral subtraction and optimal PTFS.

the poor quality of speech, a significant amount of noise wasremoved.

The filters which are used in this project are derived inDaubechies [16] and Johnston [17]. The algorithm was testedwith three different wavelets: 1) Daubechies 4D/4D Sym-metric/Antisymmetric QMF Filters; 2) Cohen and Daubechies7D/9D Symmetric/Symmetric QMF Filters; and 3) the John-ston’s 32D QMF Filters. The noise reduction achieved by thesystem was significantly affected by the wavelets (length ofthe filter coefficients and attenuation in the cutoff edges). Thefaster the attenuation in the transition band of the filter, the lesserror noise the wavelet transmits to the output. Daubechies’sfilters have very slow attenuation while the Johnston filteris faster. Experiments also show that Johnston’s filter givesbetter performance than Daubechies’s. Therefore, employingfilters with fast attenuation leads to a clear improvement in theperformance of the algorithm.

The effect of sparsity on the detection and estimation of sig-nals was also tested. The parameters for the examples are asfollows. Sparsity ranges from 0.5 to 0.9 and received infor-mation SNR 0 dB. The corrupting noise is additive whiteGaussian noise . When sparsity is high, the algo-rithm achieves a high gain, increasing as sparsity increases. Infact, it increases sharply when sparsity is greater than 0.8. Thisis not surprising; the more dispersed the signal, the harder it is tofind. The algorithm does produce a positive gain at every pointtested. The same noisy speech was processed using conventionalmagnitude subtraction. As expected, the noise power was signif-icantly reduced and the quality of speech was improved. How-ever, the musical noise was audible in the processed speech aspredicted. The noise not only reduced the quality of the speechbut distracted the listener’s attention from the main speaker.Fig. 6 shows comparison between the conventional spectral sub-traction and the optimal PTFS ( 2.5 ), as well asDonoho’s hard and soft thresholding methods. The SNR gainfrom the PTFS is greater than (or equal to) the gain obtainedby the conventional subtraction. The gain obtained from the op-timal settings is about 10 dB at10 dB SNR condition, 7 dBat 0 dB, and 15.5 dB at 20 dB. The gain improvement in the

higher input SNRs is more noticeable because of the more accu-rate masking threshold. The enhanced speech quality has beengreatly improved, especially in unvoiced portions. The noisecomponent during silence periods has been attenuated by 20 dB.

As judged from informal listening tests, the spectral subtrac-tion suffers from the well known residual “musical noise” andalso, as a result, the enhanced speech is distorted. The residualnoise experienced using Donoho’s methods can be describedas clicky rather than musical, and the speech signal appearsdistorted. The proposed algorithm removes noise almost com-pletely and has no audible distortion. It was also found thoughinformal listening testing, that the proposed scheme producedmuch better subjective speech quality than that of conventionalschemes despite recording similar SNR gain below5 dB. Thisis primarily because the perceptual scheme leads to better esti-mates for the noise spectra than the conventional approaches.Therefore, it implies that the perceptual algorithm will producesuperior noise reduction in low SNRs where other schemes mayfail. Hence, the algorithm is expected to perform better over awider range of input SNRs than normal spectral subtraction.

VI. CONCLUSION

This model of speech perception of the normal and impairedauditory systems, developed principally to facilitate the designof hearing aids, provides a basis for developing new speech en-hancement algorithms. The work presented has concentrated onthe design of an algorithm based on the features of auditory per-ception.

Discrete wavelet transforms were used to model the humanauditory system, to achieve critical band filtering and providethe framework for masking and loudness sensation modeling.A novel feature of the proposed model is that every stage in theprocessing is consistent with psychoacoustic properties. Thisunified modeling approach identifies with most of the majorproblems in hearing aids.

Specifically, a new algorithm, PTFS, for noise reduction insingle-input systems has been developed. It is a subtraction


scheme in the wavelet domain, and the properties of the maskingphenomenon are incorporated to obtain a nonlinear weightingfunction. The method can be interpreted as a generalizationof Donoho and Johnstone’s techniques. In contrast to Donohomethod, the optimal weighting filtering has been found to givethe best performance. In the PTFS method, three regions ofSNR are identified as being significant to the analysis they are:very low SNR, very high SNR and intermediate SNR, makingit possible to reduce resulting errors by further processing.The auditory filtering and the masking characteristics of theauditory system are exploited to reduce the noise at frequencieswith low SNR as well as high SNR. The masking thresholdswere evaluated from extensive objective quality measures andinformal listening testings. The results reveal that the PTFS canproduce superior performance when compared with the othermethods.

A tradeoff between the amount of noise reduction and thequality of the speech was observed and optimal filtering wasobtained to provide considerable smoothing, removing the un-wanted noise information from the low SNR regions. The exper-imental results show that the PTFS is successful in removingadditive noise from a contaminated speech signal. The SNRgain obtained for the optimal settings ( and

is approximately 10 dB at 10 dB SNRcondition, 7 dB at 0 dB, and 15 dB at 20 dB. Similar noise re-duction resulted at very low SNRs, below5 dB, as comparedto that achieved using conventional spectral subtraction tech-nique with improvements in the PTFS performance for signalshigher than 5 dB. Donoho’s soft thresholding has similar per-formance to the spectral subtraction below 10 dB and achievesbetter results above 10 dB. Although Donoho’s hard thresh-olding gives the best performance at high SNRs conditions ithas the worst performance at low SNRs (below 0 dB). The en-hanced speech quality with the optimal filtering has been greatlyimproved, especially in unvoiced portions. The noise compo-nent during silence periods has been attenuated a maximum of20 dB. In particular, the musical noise and clicking noise werereduced without degrading the quality of the processed speech.The perceptual scheme produced more noise reduction, even atlow SNRs, than other schemes as a consequence of the waveletauditory filtering, leading to better speech estimates with timeand frequency and the masking phenomenon being exploitedaccurately by identifying the three perceptually different SNRregions. More importantly, the results demonstrate the advan-tages of incorporating the auditory features into a wide varietyof noise suppression algorithms.

The PTFS was affected by the bandwidth and properties of theauditory filters. Better results were obtained with bands whichcorresponding to that observed in the basilar membrane. It hasalso been shown that sparsity has a dramatic effect on the de-tection and estimation of signals. A signal with high sparsitystands out in the transform domain, having wavelet coefficientsthat are readily determined by the algorithm. The correspondingsignal estimates are thus very good. In contrast, a signal with lowsparsity is spread out in the transform domain and is somewhatindistinguishable from noise. Thus these signals are difficult to

estimate. These results have been confirmed for the signal esti-mation algorithm, demonstrating that sparsity is a key parameterin the detection and estimation of signals using wavelets.

This paper has demonstrated the definite advantages in in-corporating auditory features into the processing algorithm. Inaddition to noise reduction, this technique can be applied inspeech coding and audio compression applications. Exploitingthese auditory features can remove redundant information in thesignal and subsequently may reduce the number of bits requiredto achieve a desired speech quality.

This noise reduction method has relevance in applicationsother than digital hearing aids, in particular portable commu-nication systems (e.g., cellular telephones).

REFERENCES

[1] J. Adam, “Medical electronics,”IEEE Spectrum, pp. 92–95, 1996.[2] A. C. Davis, “Hearing disorders in the population: First phase findings of

the MRC National Study of Hearing,” inHearing Science and HearingDisorders, M. E. Lutman and M. P. Haggard, Eds. New York: Aca-demic, 1983.

[3] J. J. Zwislocki, “Cochlear function—An analysis,”Acta Otolaryngol(Stockholm), vol. 100, pp. 201–209, 1981.

[4] J. B. Allen, “How do humans process and recognize speech?,”IEEETrans. Speech Audio Processing, vol. 2, pp. 567–577, Apr. 1994.

[5] A. Kaelin, A. Lindgren, and S. Wyrsch, “A digital frequency-domainimplementation of a very high gain hearing aid with compensation forrecruitment of loudness and acoustic echo cancellation,”Signal Pro-cessing, vol. 64, pp. 71–85, 1998.

[6] M. H. Goldstein, “Auditory periphery as speech signal processor,”IEEEEng. Med. Biol., pp. 186–196, Apr. 1994.

[7] N. A. Whitmal, J. C. Rutledge, and J. Cohen, “Reduction correlatednoise in digital hearing aids,”IEEE Eng. Med. Biol., pp. 88–96, Sept.1996.

[8] D. M. Chabries, D. V. Anderson, T. G. Stockham, and R. W. Chris-tiansen, “Application of a human auditory model to loudness percep-tion and hearing compensation,” inIEEE Int. Conf. Acoust. Speech andSignal Proc., 1995, pp. 3527–3530.

[9] M. Killion, “The K-Amp hearing aid: An attempt to present high-fi-delity for persons with impaired hearing,”Amer. J. Audiology, vol. 2,pp. 52–73, 1993.

[10] H. Dillon and R. Lovegrove, “Single-microphone noise reduction sys-tems for hearin-aids: A review and evaluation,” inAcoustical FactorsAffecting Hearing Aid Performance, G. A. Studebaker and I. Hochberg,Eds. Boston, MA: Allyn and Bacon, 1993.

[11] D. G. Jamieson, R. L. Brennan, and L. E. Cornelisse, “Evaluation of aspeech enhancement strategy with normal-hearing and hearing-impairedlisteners,”Ear and Hearing, vol. 16, no. 3, pp. 274–286, 1995.

[12] B. C. J. Moore,Introduction to the Psychology of Hearing. New York:Academic, 1989.

[13] S. V. Vaseghi,Advanced Signal Processing and Digital Noise Reduc-tion. New York: Wiley and Teubner, 1996.

[14] D. L. Donoho, “De-noising by soft-thresholding,”IEEE Trans. Inform.Theory, vol. 41, pp. 613–627, May 1995.

[15] H. M. Polchlopek and J. P. Noonan, “Wavelets, detection, estimation,and sparsity,”Signal Processing, no. 7, pp. 28–37, 1997.

[16] I. Daubechies, “Ten lectures on wavelets,” inCBMS-NSF RegionalConference Series in Applied Mathematics. Philadelphia, PA: SIAM,1992, vol. 61.

[17] D. Johnston, “A filter family designed for use in quadrature mirror filterbanks,” inProc. IEEE Int. Conf. Acoust. Speech, and Signal Processing,Apr. 1980, pp. 291–294.


Min Li received the B.Sc. degree in electronicengineering in 1982 from Taiyuan University ofTechnology, Taiyuan, P.R.C. She received theM.Phil. degree with the dissertation “HardwareDesign and Specification of Speech Signal Proces-sors” from the University of Wales, Swansea, U.K.,in 1996 and the Ph.D. degree with the dissertation“Wavelet and Filterbank Modeling of the Normaland Impaired Auditory Systems” from the Universityof Ulster, N. Ireland, U.K., in 1998.

In 1993, she joined the N. Ireland Bio-EngineeringCentre, the University of Ulster, and worked on Digital Hearing Aid design withadvanced signal processing algorithms and DSP chips. She joined Nortel Net-works in 1997 and since then she has been working on ASIC and FPGA designof telecommunication and data communication networks in the communica-tion research and development center, Belfast Lab, Nortel Networks, N. Ireland,U.K., where she is now a Senior ASIC Design Engineer. She has published sev-eral papers in IEEE Transactions, IEE Proceedings, IEEE ICASSP, IEEE DigitalSignal Processing Workshop, IEEE EMBS Processing Workshop, ICSPAT, andother speech-related and bioengineering-related journals and conferences. Hermain research interests include digital signal processing and its applications inspeech processing, human auditory modeling, hearing-aid design and commu-nication, digital signal processing hardware, and ASIC design in data commu-nication and telecommunications.

Hector G. McAllister received the B.Sc. (Hons.)degree in electrical and electronic engineering fromQueen’s University, Belfast, N. Ireland, in 1977. Hereceived the M.Sc. degree in electronics at Queen’sUniversity, Belfast, in 1983 and a Ph.D. at theUniversity of Ulster in 1996.

He spent one year in industry as a DevelopmentEngineer in medical electronic equipment and thenbecame a Senior Research Officer in Queen’s Univer-sity, Faculty of Medicine. During this time, his maininterests were in the study of electroencephalography

and evoked potentials and the development of recording and sensory stimulationmethods. In 1987, he was employed as Lecturer in Informatics at the Universityof Ulster and appointed Senior Lecturer in 1993. His current main interests arein the development of digital signal processing-based hearing aids and recordingand analysis of auditory and cognitive evoked potentials. He has over 50 publica-tions in conferences and journals all within the area of biomedical engineering.

Dr. McAllister is a Chartered Engineer and a Member of the Institution ofElectrical Engineers.

Norman D. Black is Dean of the Faculty ofInformatics, Professor of Medical Informatics, andDirector of the Institute of Health Informatics atthe University of Ulster at Jordanstown, N. Ireland.He has a long history in the field of biomedicalengineering, as Director of the Northern IrelandBioEngineering Center (NIBEC) from 1994 to 1999(March). He is the current President of the EuropeanSociety for Engineering and Medicine (ESEM), hasover 100 scientific publications and has attractedconsiderable external research earnings. He has

substantial experience in research project management having been principalinvestigator on numerous EU/Research Council/Industry-funded projectsinvolving individual project budgets of EURO 2.5 m and totaling in excess ofEURO 7 m. He is also the Co-ordinator for an EU-funded ALFA networkingprogram; the current project is the third to be funded and is valued at EURO417 000.

Trina Adrián de Pérez received the EE and MSEEdegrees from Simón Bolívar University, Caracas,Venezuela, in 1977 and 1982, respectively.

Since 1977, she has been a Professor in the Elec-tronic Engineering Department at the Simón BolívarUniversity. At present, she is a Titular Professor andthe Electronic Engineering Coordinator at SimónBolívar University. She has taught undergraduatesand postgraduates courses in communication andsignal processing. Her research work is in adaptivenoise cancellation, time-frequency distribution, and

wavelet applications. She has 25 articles in proceedings of conferences and tenarticles in journals.

perceptual time-frequency subtraction algorithm for noise reduction in hearing aids

Documents