speech processing and recognition © florida institute of technology access audio data in real time...
TRANSCRIPT
![Page 1: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/1.jpg)
© Florida Institute of Technology
Speech Processing and Recognition
Access audio data in real time and apply to speech
recognition
Final Exam ProjectFinal Exam Project ByBy Hesheng LiHesheng Li
Instructor: Dr.KepuskaInstructor: Dr.KepuskaDepartment of Electrical and Computer Engineering Department of Electrical and Computer Engineering
![Page 2: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/2.jpg)
2
© Florida Institute of Technology
Speech Processing and Recognition
Overview
Introduction Three models to access live audio data How to get audio data by using low level
API model? Application in speech recognition Comparison and Analysis Conclusion
![Page 3: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/3.jpg)
3
© Florida Institute of Technology
Speech Processing and Recognition
Introduction
Why ?Why ? HowHow??
Live audio data access has a Wide application !Live audio data access has a Wide application !
![Page 4: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/4.jpg)
4
© Florida Institute of Technology
Speech Processing and Recognition
Three model to access live audio data
High level Digital Audio API-----MCI
DirectSound
Low level Digital Audio API----WaveX
![Page 5: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/5.jpg)
5
© Florida Institute of Technology
Speech Processing and Recognition
High level Digital Audio APIMCI
MCI
The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files
Two different ways are possible to send devices a command.
1. Command message interface
2. Command string interface
![Page 6: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/6.jpg)
6
© Florida Institute of Technology
Speech Processing and Recognition
Command message interface
Passing binary values and structures to an Audio device is referred to as using the "Command message interface“
We use the function mciSendCommand() to send commands using this approach.
Example waveParams.lpstrElementName = "C:\\WINDOWS\\
CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE|
MCI_OPEN_TYPE_ID, (DWORD)
(LPVOID)&waveParams)
![Page 7: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/7.jpg)
7
© Florida Institute of Technology
Speech Processing and Recognition
Command string interface
Passing strings to an Audio device is referred to
as using the "Command string interface“We use the function mciSendString() to send
commands using this approach.Example mciSendString(“ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))
![Page 8: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/8.jpg)
8
© Florida Institute of Technology
Speech Processing and Recognition
MCI
Some other command:Command message interface:
1.Start record by “MCI _REOCRD”
2.Write data to wave file by “MCI _SAVE”
3.Stop by “MCI _STOP”
4.Play by “MCI_PLAY”
Command string interface:
1.Play by "play %s %s %s"
2.Stop by “stop %s %s %s"
![Page 9: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/9.jpg)
9
© Florida Institute of Technology
Speech Processing and Recognition
DirectSound
Like other components of DirectX,DirectSound allow you to
use the hardware in the most efficient way
Here are some other things that DirectSound makes easy: Querying hardware capabilities at run time to determine the best solution
for any given personal computer configuration Using property sets so that new hardware capabilities can be exploited even
when they are not directly supported by DirectSound Low-latency mixing of audio streams for rapid response Implementing three dimensional (3-D) sound
![Page 10: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/10.jpg)
10
© Florida Institute of Technology
Speech Processing and Recognition
Directsound
DirectSound playback is built on the IDirectSound
Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating sound buffers.
DirectSound capture is based on the IDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.
![Page 11: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/11.jpg)
11
© Florida Institute of Technology
Speech Processing and Recognition
Low level Digital Audio API----WaveX
Open audio deviceOpen audio devicePrepare structure Prepare structure
for recordingfor recordingStartStart
recordingrecording
DataDataprocessingprocessing
Release structureRelease structureClose audio deviceClose audio device
![Page 12: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/12.jpg)
12
© Florida Institute of Technology
Speech Processing and Recognition
Open Audio DeviceOpen Audio Device
There are several different approaches you can
take, depending upon how fancy and flexible you
want your program to be.
1. Pass the value ”Wave mapper ” to open "preferred audio input/output device.
2. Call function to get the list of the devices and then open the audio device which one you want
3. WaveInOpen() and WaveOutOpen()
![Page 13: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/13.jpg)
13
© Florida Institute of Technology
Speech Processing and Recognition
EXAMPLE
result = waveInOpen(&outHandle, WAVE_MAPPER, result = waveInOpen(&outHandle, WAVE_MAPPER,
&waveFormat, &waveFormat,
(DWORD)myWindow, (DWORD)myWindow,
0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);
ifif (result) (result)
{ printf("There was an error opening the { printf("There was an error opening the
preferred Digital Audio in device!\r\n"); }preferred Digital Audio in device!\r\n"); }
![Page 14: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/14.jpg)
14
© Florida Institute of Technology
Speech Processing and Recognition
EXAMPLE
iNumDevs = waveInGetNumDevs(); iNumDevs = waveInGetNumDevs();
forfor (i = 0; i < iNumDevs; i++) { (i = 0; i < iNumDevs; i++) {
ifif (!waveOutGetDevCaps(i, &woc, (!waveOutGetDevCaps(i, &woc, sizeofsizeof(WAVEOUTCAPS))) (WAVEOUTCAPS)))
{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }
result = result = waveInOpen(&outHandle,iNumDevs,&waveForwaveInOpen(&outHandle,iNumDevs,&waveFormat,mat,
(DWORD)myWindow, (DWORD)myWindow,
0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);
ReturnReturn
![Page 15: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/15.jpg)
15
© Florida Institute of Technology
Speech Processing and Recognition
Structure wavefomatexWFomatTag WFomatTag PCM, Mulaw, AulawPCM, Mulaw, AulawnChannelsnChannels Mono,StereoMono,StereonSamplePernSamplePerSecSec
Sample rates,ie 8000HZSample rates,ie 8000HZ
navgBytePenavgBytePerSecrSec
Average data-transfer rateAverage data-transfer rate
nBlockAlignBlockAlignn
Minimum atomic unit of Minimum atomic unit of datadata
wBitsPerSawBitsPerSamplemple
8bits or 16bits per sample8bits or 16bits per sample
cbSizecbSize Extra format informationExtra format information
![Page 16: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/16.jpg)
16
© Florida Institute of Technology
Speech Processing and Recognition
Example
WAVEFORMATEX waveFormat; WAVEFORMATEX waveFormat;
/* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */*/ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nChannels = 2;
waveFormat.nSamplesPerSec = 44100; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.wBitsPerSample = 16;
waveFormat.nBlockAlign =waveFormat.nChannels* waveFormat.nBlockAlign =waveFormat.nChannels*
(waveFormat.wBitsPerSample/8); (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPwaveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * erSec *
waveFormat.nBlockAlign; waveFormat.nBlockAlign;
waveFormat.cbSize = 0;waveFormat.cbSize = 0; ReturnReturn
![Page 17: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/17.jpg)
17
© Florida Institute of Technology
Speech Processing and Recognition
Recording engine
buffer1buffer1buffer2buffer2buffer3buffer3buffer4buffer4
Call back functionCall back function
Data proccesingData proccesing
AddInBuffer()AddInBuffer()
waveInStart()waveInStart()
AudioAudio devicedevice
ms
ms
gg
![Page 18: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/18.jpg)
18
© Florida Institute of Technology
Speech Processing and Recognition
Recording engine
buffer2buffer2buffer3buffer3buffer4buffer4buffer1buffer1
Call back functionCall back function
Data processingData processingm
sm
sgg
AudioAudio devicedevice
Circular buffer
![Page 19: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/19.jpg)
19
© Florida Institute of Technology
Speech Processing and Recognition
1+3+1
Three Important methods: prepare a buffer for wave-audio input
function: WaveInPrepareHeader() Send the buffer to audio device,when the buffer is full
the application is notified
function: WaveInAddBuffer() Start recording
function: WaveInStart()
![Page 20: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/20.jpg)
20
© Florida Institute of Technology
Speech Processing and Recognition
Example
if(MMSYSERR_NOERROR != if(MMSYSERR_NOERROR !=
waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))
{ {
printf(“prepare buffer faliure!”) printf(“prepare buffer faliure!”)
}}
waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));
waveInStart(m_hWaveIn);waveInStart(m_hWaveIn);
![Page 21: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/21.jpg)
21
© Florida Institute of Technology
Speech Processing and Recognition
MessageWindows messages: MM_WIM_DATA:this message is sent to a window when the data is present
in the buffer and buffer is being returned to the application
Other messages: MM_WIM_CLOSE 、 MM_WIM_OPEN 、 MM_WOM_CLOSE MM_WOM_DONE 、 MM_WOM_OPEN
Call back function messages: WIM_DATA: this message is sent to the given call back function when the
data is present in the input buffer and the buffer is being
returned to the application
Other messages: WIM_CLOSE 、 WIM_DONE 、 WIN_OPEN 、 WOM_CLOSE 、 WOM_DONE 、 WOM_OPEN
![Page 22: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/22.jpg)
22
© Florida Institute of Technology
Speech Processing and Recognition
Message ExampleCall back message
waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,
waveInProc, 0L, CALLBACK_FUNCTION )waveInProc, 0L, CALLBACK_FUNCTION )
waveInProc(…..) {waveInProc(…..) {
switch(msg) {switch(msg) {
case WIM_OPEN: ………….case WIM_OPEN: ………….
break,break,
case WIM_DATA: ………….case WIM_DATA: ………….
break,break,
case WIM_CLOSE: …………case WIM_CLOSE: …………
Window message
waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,
hWnd, 0L, CALLBACK_WINDOW )hWnd, 0L, CALLBACK_WINDOW )ReturnReturn
![Page 23: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/23.jpg)
23
© Florida Institute of Technology
Speech Processing and Recognition
Application in Real-time Key Word Recognition
Front - EndAudio
InterfaceBack-End
Training/Testing/Analysis
12/18/2003
Key-Word Recognizer
Monitor
To be continuedTo be continued….….
![Page 24: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/24.jpg)
24
© Florida Institute of Technology
Speech Processing and Recognition
Application in Real-time Key Word Recognition
Practical problems when we apply this model in Practical problems when we apply this model in speech recognitionspeech recognition
1.1. AsynchronismAsynchronism
2.2. EfficiencyEfficiency
![Page 25: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/25.jpg)
25
© Florida Institute of Technology
Speech Processing and Recognition
Application in Real-time Key Word Recognition
buffer2buffer2
Call back functionCall back function
Data proccessingData proccessing
buffer3buffer3 buffer4buffer4 buffer500buffer500……..
ms
ms
gg
CA
LC
AL
LL
buffer1buffer1
![Page 26: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/26.jpg)
26
© Florida Institute of Technology
Speech Processing and Recognition
Comparison and Analysis
Mci is the easiest model ,very convenient,but Mci is the easiest model ,very convenient,but offers the least amount control,”FileLevel”offers the least amount control,”FileLevel”
waveX is more complicit ,but can flexible waveX is more complicit ,but can flexible control audio data,”BufferLevel” control audio data,”BufferLevel”
Direct sound is the most efficient Direct sound is the most efficient method,but most complicit, ”BufferLevel” method,but most complicit, ”BufferLevel”
![Page 27: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By](https://reader036.vdocuments.us/reader036/viewer/2022062804/56649f2b5503460f94c4602a/html5/thumbnails/27.jpg)
27
© Florida Institute of Technology
Speech Processing and Recognition
Conclusion
Apply MCI to audio document part in Apply MCI to audio document part in “video conference”“video conference”
Apply WaveX to real time speech Apply WaveX to real time speech recognition and also to “video conference” recognition and also to “video conference”
Direct sound is widely used in computer Direct sound is widely used in computer game design game design