improvement of audio capture in handheld devices through ...unh.edu/ece/department/senior...

1
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This makes speech recognition less reliable. Choosing a Test Sound Two different test sounds were tested to find which sound worked better. One test sound was a sine wave increasing in frequency from 80Hz to 8000Hz. The other test sound was pink noise. Speech was recorded and filtered from four different people. These files were filtered using the coefficients produced by the least mean square algorithm. These recordings were then tested against two different speech recognizers, the one built into Windows XP and the one built into Windows 7. The Windows 7 recognizer had a higher baseline success rate than the XP recognizer. Overall, the filter created from the pink noise fixed more speech recognition errors than the other filter. Also, all but one of the phrases fixed by the filter from the sine wave were also fixed by the pink noise. For the Future Test more filter lengths, iterations, gains, sound files Insert filter into Windows Mobile recording stack Add options to the program to change the filter creation parameters Jonathan Brown: [email protected] <> Sam Marlin: [email protected] <> Advisor: W. T. Miller Proposed Solution Using digital signal processing, a filter will be created to “undo” the distortion caused by the poor quality microphone. This process will be able to generate a filter for any handheld that uses the Windows Mobile platform, creating a custom tailored filter based on the acoustic characteristics of each device. Reference audio files, with known frequency components, will be used to find what frequencies are attenuated by the handheld. Testing the Code All the code was first done in Matlab for testing purposes. The code was then ported to C# for final deployment. Save the Filter The filter coefficients are then saved into the registry of the handheld device for use by any audio recording or voice recognition application. Record Test Sound Play an ideal test sound from the computer while recording it on the handheld. Create the Filter The program on the computer will compare the test sound and the recorded sound to create the filter. Setup Setup computer, speakers and handheld device. Steps of the Solution Process Lining up the Sound Files Each test sound file had 10 cycles of a 440Hz sine wave at its start. This knowledge was used to line up the two sound files through cross- correlation. Problem The above equation did not line up the sound files for all time. The time steps in each of the sounds are different, after 1000 samples the files would noticeably unaligned. To fix this, cross-correlation was used again to match the indexes in one file to another. Creating the Filter The least mean square algorithm was used to create the filter coefficients. For this algorithm to work, the test files have to be lined up in time. This algorithm has many different variables, so tests were done to find best filter parameters to solve the problem. The sine wave test sound file was used in these parameter tests. Choosing the numbers depended on two values, the RMS of the error value used in the algorithm and if the filter coefficients changed by varying the iterations. Numbers used in testing: Gain: 0.001, 0.0001, 0.00001 Iterations: 500 to 3900 in steps of 200 Filter Size: 257 k) - NI(n × e(n) × g + FC(k) = FC(k) FS 0 k k) NI(n FC(k) I(n) e(n) I = the ideal waveform NI = the non ideal waveform FC = the filter coefficients FS = filter size e = equalization error g = the gain Windows 7 Speech Recognizer Noise Unfiltered Filter from Sine Wave Filter from Pink Noise Recognized 158 159 167 Broke - 5 6 Fixed - 6 15 0.5 1 1.5 2 2.5 3 3.5 4 0 1000 2000 3000 4000 5000 6000 7000 8000 Time (s) Spectrogram of Sine Wave Test Signal Frequency (Hz) 0 1 2 3 4 5 6 7 8 -70 -65 -60 -55 -50 -45 -40 -35 Frequency (kHz) Power/frequency (dB/Hz) Welch Power Spectral Desnsity Estimate of Pink Noise Test Signal 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -2.5 -2 -1.5 -1 -0.5 0 0.5 x 10 4 Normalized Frequency ( rad/sample) Phase (degrees) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -30 -20 -10 0 10 Normalized Frequency ( rad/sample) Magnitude (dB) Frequency Response of Sine Wave Test Signal 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -2.5 -2 -1.5 -1 -0.5 0 0.5 x 10 4 Normalized Frequency ( rad/sample) Phase (degrees) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -40 -30 -20 -10 0 10 Normalized Frequency ( rad/sample) Magnitude (dB) Frequency Response of Pink Noise Test Signal Final Values: Gain: 0.0001 Iterations: 1500 Filter Size: 257 Conclusions The filter developed using the pink noise test signal resulted in a statistically significant improvement in speech recognizer performance at the 90% confidence level (from 79% to 83.5 % correct) . This indicates that the technique could provide a functionally significant improvement in practice, and warrants further investigation.

Upload: others

Post on 18-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improvement of Audio Capture in Handheld Devices through ...unh.edu/ece/Department/Senior Projects/ECE792_2011/Projects/PDA… · Each test sound file had 10 cycles of a 440Hz sine

Improvement of Audio Capture in Handheld Devices through Digital Filtering

Problem Microphones in handheld devices are of low quality to reduce cost. This makes speech recognition less reliable.

Choosing a Test Sound Two different test sounds were tested to find which sound worked better. One test sound was a sine wave increasing in frequency from 80Hz to 8000Hz. The other test sound was pink noise. Speech was recorded and filtered from four different people. These files were filtered using the coefficients produced by the least mean square algorithm. These recordings were then tested against two different speech recognizers, the one built into Windows XP and the one built into Windows 7. The Windows 7 recognizer had a higher baseline success rate than the XP recognizer. Overall, the filter created from the pink noise fixed more speech recognition errors than the other filter. Also, all but one of the phrases fixed by the filter from the sine wave were also fixed by the pink noise.

For the Future • Test more filter lengths, iterations, gains, sound files • Insert filter into Windows Mobile recording stack • Add options to the program to change the filter

creation parameters

Jonathan Brown: [email protected] <> Sam Marlin: [email protected] <> Advisor: W. T. Miller

Proposed Solution Using digital signal processing, a filter will be created to “undo” the distortion caused by the poor quality microphone. This process will be able to generate a filter for any handheld that uses the Windows Mobile platform, creating a custom tailored filter based on the acoustic characteristics of each device. Reference audio files, with known frequency components, will be used to find what frequencies are attenuated by the handheld.

Testing the Code All the code was first done in Matlab for testing purposes. The code was then ported to C# for final deployment.

Save the Filter The filter coefficients are then saved into the registry of the handheld device for use by any audio recording or voice recognition application.

Record Test Sound Play an ideal test sound from the computer while recording it on the handheld.

Create the Filter The program on the computer will compare the test sound and the recorded sound to create the filter.

Setup Setup computer, speakers and handheld device.

Steps of the Solution Process

Lining up the Sound Files Each test sound file had 10 cycles of a 440Hz sine wave at its start. This knowledge was used to line up the two sound files through cross-correlation.

Problem The above equation did not line up the sound files for all time. The time steps in each of the sounds are different, after 1000 samples the files would noticeably unaligned. To fix this, cross-correlation was used again to match the indexes in one file to another.

Creating the Filter The least mean square algorithm was used to create the filter coefficients. For this algorithm to work, the test files have to be lined up in time. This algorithm has many different variables, so tests were done to find best filter parameters to solve the problem. The sine wave test sound file was used in these parameter tests. Choosing the numbers depended on two values, the RMS of the error value used in the algorithm and if the filter coefficients changed by varying the iterations. Numbers used in testing: Gain: 0.001, 0.0001, 0.00001 Iterations: 500 to 3900 in steps of 200 Filter Size: 257

k)-NI(n×e(n)×g+FC(k)=FC(k)

FS

0k

k)NI(nFC(k)I(n)e(n)I = the ideal waveform NI = the non ideal waveform FC = the filter coefficients FS = filter size e = equalization error g = the gain

Windows 7 Speech Recognizer

Noise Unfiltered Filter from Sine Wave Filter from Pink Noise

Recognized 158 159 167

Broke - 5 6

Fixed - 6 15

0.5 1 1.5 2 2.5 3 3.5 40

1000

2000

3000

4000

5000

6000

7000

8000

Time (s)

Spectrogram of Sine Wave Test Signal

Fre

quency (

Hz)

0 1 2 3 4 5 6 7 8-70

-65

-60

-55

-50

-45

-40

-35

Frequency (kHz)

Pow

er/

frequency (

dB

/Hz)

Welch Power Spectral Desnsity Estimate

of Pink Noise Test Signal

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2.5

-2

-1.5

-1

-0.5

0

0.5x 10

4

Normalized Frequency ( rad/sample)

Phase (

degre

es)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-30

-20

-10

0

10

Normalized Frequency ( rad/sample)

Magnitude (

dB

)

Frequency Response of Sine Wave Test Signal

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2.5

-2

-1.5

-1

-0.5

0

0.5x 10

4

Normalized Frequency ( rad/sample)

Phase (

degre

es)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-40

-30

-20

-10

0

10

Normalized Frequency ( rad/sample)

Magnitude (

dB

)

Frequency Response of Pink Noise Test Signal

Final Values: Gain: 0.0001 Iterations: 1500 Filter Size: 257

Conclusions The filter developed using the pink noise test signal resulted in a statistically significant improvement in speech recognizer performance at the 90% confidence level (from 79% to 83.5 % correct) . This indicates that the technique could provide a functionally significant improvement in practice, and warrants further investigation.