speech processing and recognition © florida institute of technology access audio data in real time...

© Florida Institute of Technology

Speech Processing and Recognition

Access audio data in real time and apply to speech

recognition

Final Exam ProjectFinal Exam Project ByBy Hesheng LiHesheng Li

Instructor: Dr.KepuskaInstructor: Dr.KepuskaDepartment of Electrical and Computer Engineering Department of Electrical and Computer Engineering

2



Overview

Introduction Three models to access live audio data How to get audio data by using low level

API model? Application in speech recognition Comparison and Analysis Conclusion

3



Introduction

Why ?Why ? HowHow??

Live audio data access has a Wide application !Live audio data access has a Wide application !

4



Three model to access live audio data

High level Digital Audio API-----MCI

DirectSound

Low level Digital Audio API----WaveX

5



High level Digital Audio APIMCI

MCI

The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files

Two different ways are possible to send devices a command.

1. Command message interface

2. Command string interface

6



Command message interface

Passing binary values and structures to an Audio device is referred to as using the "Command message interface“

We use the function mciSendCommand() to send commands using this approach.

Example waveParams.lpstrElementName = "C:\\WINDOWS\\

CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE|

MCI_OPEN_TYPE_ID, (DWORD)

(LPVOID)&waveParams)

7



Command string interface

Passing strings to an Audio device is referred to

as using the "Command string interface“We use the function mciSendString() to send

commands using this approach.Example mciSendString(“ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))

8



MCI

Some other command:Command message interface:

1.Start record by “MCI _REOCRD”

2.Write data to wave file by “MCI _SAVE”

3.Stop by “MCI _STOP”

4.Play by “MCI_PLAY”

Command string interface:

1.Play by "play %s %s %s"

2.Stop by “stop %s %s %s"

9



DirectSound

Like other components of DirectX,DirectSound allow you to

use the hardware in the most efficient way

Here are some other things that DirectSound makes easy: Querying hardware capabilities at run time to determine the best solution

for any given personal computer configuration Using property sets so that new hardware capabilities can be exploited even

when they are not directly supported by DirectSound Low-latency mixing of audio streams for rapid response Implementing three dimensional (3-D) sound

10



Directsound

DirectSound playback is built on the IDirectSound

Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating sound buffers.

DirectSound capture is based on the IDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.

11



Low level Digital Audio API----WaveX

Open audio deviceOpen audio devicePrepare structure Prepare structure

for recordingfor recordingStartStart

recordingrecording

DataDataprocessingprocessing

Release structureRelease structureClose audio deviceClose audio device

12



Open Audio DeviceOpen Audio Device

There are several different approaches you can

take, depending upon how fancy and flexible you

want your program to be.

1. Pass the value ”Wave mapper ” to open "preferred audio input/output device.

2. Call function to get the list of the devices and then open the audio device which one you want

3. WaveInOpen() and WaveOutOpen()

13



EXAMPLE

result = waveInOpen(&outHandle, WAVE_MAPPER, result = waveInOpen(&outHandle, WAVE_MAPPER,

&waveFormat, &waveFormat,

(DWORD)myWindow, (DWORD)myWindow,

0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);

ifif (result) (result)

{ printf("There was an error opening the { printf("There was an error opening the

preferred Digital Audio in device!\r\n"); }preferred Digital Audio in device!\r\n"); }

14



EXAMPLE

iNumDevs = waveInGetNumDevs(); iNumDevs = waveInGetNumDevs();

forfor (i = 0; i < iNumDevs; i++) { (i = 0; i < iNumDevs; i++) {

ifif (!waveOutGetDevCaps(i, &woc, (!waveOutGetDevCaps(i, &woc, sizeofsizeof(WAVEOUTCAPS))) (WAVEOUTCAPS)))

{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }

result = result = waveInOpen(&outHandle,iNumDevs,&waveForwaveInOpen(&outHandle,iNumDevs,&waveFormat,mat,

(DWORD)myWindow, (DWORD)myWindow,

0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);

ReturnReturn

15



Structure wavefomatexWFomatTag WFomatTag PCM, Mulaw, AulawPCM, Mulaw, AulawnChannelsnChannels Mono,StereoMono,StereonSamplePernSamplePerSecSec

Sample rates,ie 8000HZSample rates,ie 8000HZ

navgBytePenavgBytePerSecrSec

Average data-transfer rateAverage data-transfer rate

nBlockAlignBlockAlignn

Minimum atomic unit of Minimum atomic unit of datadata

wBitsPerSawBitsPerSamplemple

8bits or 16bits per sample8bits or 16bits per sample

cbSizecbSize Extra format informationExtra format information

16



Example

WAVEFORMATEX waveFormat; WAVEFORMATEX waveFormat;

/* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */*/ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nChannels = 2;

waveFormat.nSamplesPerSec = 44100; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.wBitsPerSample = 16;

waveFormat.nBlockAlign =waveFormat.nChannels* waveFormat.nBlockAlign =waveFormat.nChannels*

(waveFormat.wBitsPerSample/8); (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPwaveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * erSec *

waveFormat.nBlockAlign; waveFormat.nBlockAlign;

waveFormat.cbSize = 0;waveFormat.cbSize = 0; ReturnReturn

17



Recording engine

buffer1buffer1buffer2buffer2buffer3buffer3buffer4buffer4

Call back functionCall back function

Data proccesingData proccesing

AddInBuffer()AddInBuffer()

waveInStart()waveInStart()

AudioAudio devicedevice

ms

ms

gg

18



Recording engine

buffer2buffer2buffer3buffer3buffer4buffer4buffer1buffer1


Data processingData processingm

sm

sgg

AudioAudio devicedevice

Circular buffer

19



1+3+1

Three Important methods: prepare a buffer for wave-audio input

function: WaveInPrepareHeader() Send the buffer to audio device,when the buffer is full

the application is notified

function: WaveInAddBuffer() Start recording

function: WaveInStart()

20



Example

if(MMSYSERR_NOERROR != if(MMSYSERR_NOERROR !=

waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))

{ {

printf(“prepare buffer faliure!”) printf(“prepare buffer faliure!”)

}}

waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));

waveInStart(m_hWaveIn);waveInStart(m_hWaveIn);

21



MessageWindows messages: MM_WIM_DATA:this message is sent to a window when the data is present

in the buffer and buffer is being returned to the application

Other messages: MM_WIM_CLOSE 、 MM_WIM_OPEN 、 MM_WOM_CLOSE MM_WOM_DONE 、 MM_WOM_OPEN

Call back function messages: WIM_DATA: this message is sent to the given call back function when the

data is present in the input buffer and the buffer is being

returned to the application

Other messages: WIM_CLOSE 、 WIM_DONE 、 WIN_OPEN 、 WOM_CLOSE 、 WOM_DONE 、 WOM_OPEN

22



Message ExampleCall back message

waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,

waveInProc, 0L, CALLBACK_FUNCTION )waveInProc, 0L, CALLBACK_FUNCTION )

waveInProc(…..) {waveInProc(…..) {

switch(msg) {switch(msg) {

case WIM_OPEN: ………….case WIM_OPEN: ………….

break,break,

case WIM_DATA: ………….case WIM_DATA: ………….

break,break,

case WIM_CLOSE: …………case WIM_CLOSE: …………

Window message

waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,

hWnd, 0L, CALLBACK_WINDOW )hWnd, 0L, CALLBACK_WINDOW )ReturnReturn

23



Application in Real-time Key Word Recognition

Front - EndAudio

InterfaceBack-End

Training/Testing/Analysis

12/18/2003

Key-Word Recognizer

Monitor

To be continuedTo be continued….….

24




Practical problems when we apply this model in Practical problems when we apply this model in speech recognitionspeech recognition

1.1. AsynchronismAsynchronism

2.2. EfficiencyEfficiency

25




buffer2buffer2


Data proccessingData proccessing

buffer3buffer3 buffer4buffer4 buffer500buffer500……..

ms

ms

gg

CA

LC

AL

LL

buffer1buffer1

26



Comparison and Analysis

Mci is the easiest model ,very convenient,but Mci is the easiest model ,very convenient,but offers the least amount control,”FileLevel”offers the least amount control,”FileLevel”

waveX is more complicit ,but can flexible waveX is more complicit ,but can flexible control audio data,”BufferLevel” control audio data,”BufferLevel”

Direct sound is the most efficient Direct sound is the most efficient method,but most complicit, ”BufferLevel” method,but most complicit, ”BufferLevel”

27



Conclusion

Apply MCI to audio document part in Apply MCI to audio document part in “video conference”“video conference”

Apply WaveX to real time speech Apply WaveX to real time speech recognition and also to “video conference” recognition and also to “video conference”

Direct sound is widely used in computer Direct sound is widely used in computer game design game design

speech processing and recognition © florida institute of technology access audio data in real time...

Documents