in collaboration with hualin gao, richard duncan, julie a. baca, joseph picone human and systems...

39
in collaboration with Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Human and Systems Engineering Center of Advanced Vehicular System Mississippi State University SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION Presented by Richard Duncan Tablet PC Microsoft Corporation

Upload: ferdinand-miller

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

in collaboration with

Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone

Human and Systems EngineeringCenter of Advanced Vehicular System

Mississippi State University

SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION

Presented by Richard Duncan

Tablet PCMicrosoft Corporation

Page 2 of 38Signal Processing Tools for Speech Recognition

WHICH TWO ARE THE SAME PHONEME?

We need to extract meaningful information from the signal for a speech recognition system to model

Page 3 of 38Signal Processing Tools for Speech Recognition

WHICH TWO ARE THE SAME PHONEME?

a: “ow” b: “aa” c: “ow”

                                                                                             

Page 4 of 38Signal Processing Tools for Speech Recognition

WHAT IS AN ACOUSTIC FRONT-END?

It encapsulates the signal processing of a speech recognition system.

It computes a sequence of feature vectors from an audio stream.

These vectors are then processed by HMMs, neural networks, or other classifiers.

Page 5 of 38Signal Processing Tools for Speech Recognition

WHY REINVENT THE WHEEL?

A Front-end has many areas of complexity:

•Run-time efficiency

•File I/O

•Data management (framing)

•DSP algorithm complexity

•Algorithm re-use

Our system abstracts the researcher/student from these mundane issues to so he or she can focus on the algorithms

Page 6 of 38Signal Processing Tools for Speech Recognition

DATA FRAMING

framen framen+1

windown

windown+1

New dataShared data

Page 7 of 38Signal Processing Tools for Speech Recognition

FEATURES OF ISIP FOUNDATION CLASSES

• Efficient memory management and tracking;

• System and I/O libraries that abstract details of the operating system;

• Math classes that provide basic linear algebra and efficient matrix manipulations;

• Generic data structures;

• Built-in unit tests to verify component correctness.

Page 8 of 38Signal Processing Tools for Speech Recognition

DESIGN REQUIREMENTS

• A library of standard algorithms provides basic digital signal processing (DSP) functions;

• New algorithms can be added without modifying existing classes;

• A block diagram tool allows rapid prototyping without programming or recompiling;

• The same system is used for offline feature extraction, recognition, and general DSP work.

Page 9 of 38Signal Processing Tools for Speech Recognition

BASIC DIGITAL PROCESSING FUNCTIONS

This example shows how to realize the basic digital signal processing functions. It computes the energy of input vector in dB using the SUM algorithm:

// declare an Energy object, input vector, and output vectorEnergy egy; VectorFloat output; VectorFloat input(L"0, 1, 2");

// choose algorithm enrgy.setAlgorithm(Energy::SUM);

// choose implementationegy.setImplementation(Energy::DB);

// compute the energy of input data egy.compute(output, input);

Page 10 of 38Signal Processing Tools for Speech Recognition

ADDING NEW ALGORITHMS

class AlgorithmBase :// Processing:virtual boolean init();virtual boolean apply();// Configuration:virtual const String& className() const;virtual long getLeadingPad() const;virtual long getTrailingPad() const;virtual CMODE getOutputMode() const;virtual float getOutputSampleFrequency() const;virtual boolean setParser();// Debugging:boolean displayStart();boolean displayFinish();boolean displayChannel();boolean display();

}

• Interface contract allows extensibility to new algorithms;

• All algorithms are classes that implement this interface;

• Most have a default implementation.

Page 11 of 38Signal Processing Tools for Speech Recognition

ADDING NEW ALGORITHMS

boolean Energy::init() { }

const String& className() const { return CLASS_NAME; }

int GetLeadingPad() const { return 0; }

int GetTrailingPad() const { return 0; }

bool Apply(Vector<AlgorithmData> output, Vector<AlgorithmData> input){ // determine what channel to operate on … if (algorithm_d == SUM) { computeSum(output(0).makeVectorFloat(), input(0).getVectorFloat()); } …}

Page 12 of 38Signal Processing Tools for Speech Recognition

ADDING NEW ALGORITHMS

boolean Energy::computeSum(VectorFloat& output_a, const VectorFloat& input_a) {

// compute the sum of squares Float e = input_a.sumSquare();

// compute the scale factor according to specified implementation float scaled_energy = scale(e, input_a.length());

// the length of the output vector should be 1 as it only contains the energy output_a.setLength(1);

// assign the value of energy to the outputoutput_a(0) = Integral::max(floor_d, scaled_energy);

// exit gracefullyreturn true;

}

Page 13 of 38Signal Processing Tools for Speech Recognition

DEFINITIONS

Algorithm:

• Input and output is an array of floating point numbers

• Correspond to basic DSP principles

Recipe:

• Collection of algorithms which are run serially, output of An-1 is the input to An

• Named input and outputs

• Allows reuse of processing blocks between systems

Page 14 of 38Signal Processing Tools for Speech Recognition

HIERARCHY OF ALGORITHM CLASSES

AlgorithmBase

FourierTransformEnergy

Constant

Calculus Generator

Prediction

Window

Cepstrum

Filter

Reflection

Statistics Math Correlation FilterBank

Mask

Covariance

Inherit from

Algorithm Contains (runtime)

Recipe

input name

output name

Algorithm2

Algorithm1

...

Algorithmn

Page 15 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

• Design a front-end by creating a block diagram;

• Allows rapid prototyping of ideas.

• New modules can easily be added into the system

• Parameter file is then the input to a full speech recognition system

Page 16 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 17 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 18 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 19 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 20 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 21 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 22 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 23 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 24 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 25 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 26 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 27 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 28 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 29 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 30 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 31 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 32 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 33 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 34 of 38Signal Processing Tools for Speech Recognition

FRONT-END CONFIGURATION TOOL

Page 35 of 38Signal Processing Tools for Speech Recognition

RESPONSIBILITIES OF THE UTILITY

• Parses the file containing the recipe created in the configuration tool;

• Synchronizes different paths along the block flow diagram contained in the recipe;

• Prepares input and output data buffers for each algorithm;

• Schedules the sequence of required signal processing operations;

• Processes data through the recipe;

• Manages large collections of data files.

Page 36 of 38Signal Processing Tools for Speech Recognition

VERIFICATION STRATEGY

• The correctness: The implementation of each algorithm is verified manually or by using other tools such as MATLAB.

• Usability: Assessed and enhanced the usability of our tools through extensive user testing conducted over the course of several training sessions.

• Speech recognition experiments: The correctness of the tools was also verified by speech recognition experiments.

Page 37 of 38Signal Processing Tools for Speech Recognition

STATE-OF-THE-ART FEATURES

• Mel-frequency cepstral coefficients (MFCCs);

• Cepstral mean subtraction;

• Energy normalization;

• 1st and 2nd order differential features;

• These features are used by most commercial speech recognition systems.

Page 38 of 38Signal Processing Tools for Speech Recognition

0

1

2

3

4

5

6

7

8

9

WER (%)

WSJ0 TIDIGITS

Experiment

New

Baseline

EXPERIMENTAL RESULTS

Page 39 of 38Signal Processing Tools for Speech Recognition

CONCLUSION

• The front-end performs signal processing for speech recognition systems;

• The ISIP front-end is implemented on an extensible library of basic DSP building blocks;

• A block diagram interface is used to configure the front-end data flow;

• The tool’s usability was optimized through multiple training sessions with new users;

• The system’s correctness was verified through speech recognition experiments.