a study on speech recognition using dynamic time warping cs 525 : project presentation palden lama...

23
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING

CS 525 : Project Presentation

PALDEN LAMA and MOUNIKA NAMBURU

Page 2: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

GOALS

Learn how it works ! Focus:

Pre-Processing Dynamic Time Warping/Dynamic Programming

Verify using MATLAB Build a simple Voice to Text Converter

application.

Page 3: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

HOW DOES IT WORK?

Record Extracta voice Feature Vectors

Digitized Speech Signal(.wave

file)

Acoustic Preprocessin

g(DFT + MFCC)

Speech Recognizer(Dynamic

Time Warping)

Page 4: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU
Page 5: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

SPEECH SIGNAL

Voiced Excitation fundamental frequency (Speaker dependent)

Loudness signal amplitude Vocal tract shape spectral shaping

(most important to recognize words)

A time signal of vowel /a:/ (fs=11 kHz, length=100ms)

time

Page 6: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

ACOUSTIC PRE-PROCESSING

DFT (Discrete Fourier Transform) Spectral Coeff. Inverse DFT on log power spectrum Cepstral

Coeff. Makes it easier to extract spectral shaping of the

speech signal.

frequency

Log power spectrum of vowel /a:/(fs=11 kHz, N=512)

Power spectrum of the vowel /a:/ after cepstral smoothing

Page 7: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

MFCC (MEL FREQUENCY CEPSTRAL COEFFICIENTS)

Mel frequency scale reflects frequency resolution of human ear.

Coeff. Of power spectrum Mel Spectral Coeff. (FEATURE VECTOR)

Page 8: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

RECOGNIZER One word spoken contains dozens of feature

vectors. (preprocessing every 10 ms of signal)

Compute a ”distance” between this unknown sequence of vectors (unknown word) and known sequence of vectors (prototypes of words to recognize)

PROBLEM !! Unequal length of vector sequence

Page 9: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

DYNAMIC TIME WARPING : FIND OPTIMAL ASSIGNMENT PATH

Page 10: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

DYNAMIC TIME WARPING : FIND OPTIMAL ASSIGNMENT PATH

Page 11: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

DYNAMIC TIME WARPING : FIND OPTIMAL ASSIGNMENT PATH

Page 12: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

DTW : RECOGNIZING CONNECTED WORDS

Page 13: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

MATLAB FUNCTIONS

PRE-PROCESSING recordMelMatrix(3)

S = wavread(“speech.wav”) C = Melfiltermatrix(S, N, K) computeMelSpectrum( C,S);

DISPLAY FEATURES Featuredisp.m

WORD RECOGNITION dp_asym(vector1, vector2)

Page 14: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

RESULTShello hello1

Page 15: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

library

hello

Page 16: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

computerhello

Page 17: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

3.0304e+003

3.5820e+003

3.4499e+003

Page 18: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

Welcome home (male)

Welcome home (female)

Page 19: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

Welcome home Welcome back

Page 20: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

Welcome home Computer Science

Page 21: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

Welcome back Computer Science

Page 22: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

2.6418e+003

2.9468e+003

3.8109e+003

4.6701e+003

Page 23: A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU

THANKS ! ANY QUESTIONS?