speech lab, ece, state university of new york at binghamton classification accuracies of neural...

Speech Lab, ECE, State University of New York at Binghamton

Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of training data using NLPCA2 (10 features) and original features (10 and 39 features)

Classification accuracies of original features and NLPCA2 reduced features with 2% (left) and 50% (right) of the training data

Simulation of NLPCA1

Plot of input and output for semi-random 2-D data. The output data is reconstructed data using an NLPCA1 trained neural network with 1 hidden node

An example with 3D data. Input and output plots of 3-D Gaussian data before and after using neural network with 2 hidden nodes

Dimensionality Reduction of Speech Features Using Nonlinear Principal Components Analysis Stephen A. Zahorian, Tara Singh*, Hongbing Hu

Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, USA* Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA

Introduction Difficulties in automatic speech recognition

Large dimensionality of acoustic feature spaces Significant load in feature training (“Curse of

dimensionality”)

Linear dimensionality reduction methods Principal Components Analysis (PCA) Linear Discriminant Analysis (LDA)

Drawback of linear methods Can result in poor data representations

The straight line fit to the data obtained by linear PCA does not accurately represent the original distribution of the data

NLPCA Approaches Nonlinear Principal Components Analysis (NLPCA)

Nonlinear transformation is applied to obtain a transformed version of the data for PCA

Nonlinear transformation

Two approaches (NLPCA1 and NLPCA2) were used for training the neural network

)(xx (x): Transformed feature of the data point x for machine learning

RM: M dimension feature space MD RR :(.)(.): A neural network mapping to obtain data more suitable for linear transformations

NLPCA Approaches

NLPCA1 The neural network is trained as an identity map

– Minimize mean square error using targets that are the same as the inputs– Training with regularization is often needed to “guide” the network to a better

minimum in error

NLPCA2 The neural network is trained as classifier

– The network is trained to maximize discrimination

Input Data

Bottleneck neural network

Dimensionality Reduced Data

Experimental Evaluation Database

Transformation methods compared Original features, LDA, PCA, NLPCA1 and NLPCA2

Classifiers Neural network and MXL (maximum likelihood Mahalanobis distance based Gaussian

assumption classifier)

Experiment 1 The same training data were used to train the transformations and the classifiers The number of features varied from 1 to 39 Variable percentages of training data (1%, 2%, 5%, 10%, 25%, 50% and 100%)

were used

Experiment 1 Results Classification accuracies of neural network (left) and MXL (right) classifiers with various types of features

using all available training data

(Figures on next column)

NTIMIT database

Target (vowels) /ah/, /ee/, /ue/, /ae/, /ur/, /ih/, /eh/, /aw/, /uh/, /oo/

Training data 31,300 tokens

Testing data 11,625 tokens

Feature 39 DCTC-DCS

Conclusions The nonlinear technique minimizing mean square

reconstruction error (NLPCA1) can be very effective for representing data which lies in curved subspaces, but does not appear to offer any advantages over linear dimensionality reduction methods for a speech classification task

The nonlinear technique based on minimizing classification error (NLPCA2) is quite effective for accurate classification in low dimensionality spaces

The reduced features appear to be well modeled as Gaussian features with a common covariance matrix

Nonlinear PCA (NLPCA2) is much more effective than normal PCA for reducing dimensionality; however, with a “good” classification method, neither dimensionality reduction method improves classification accuracy.

Acknowledgement This work was partially supported by JWFC 900

20

30

40

50

60

70

80

1 4 7 10 13 16 19 22 25 28 31 34 37

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

OrgLDAPCANLPCA1NLPCA2

20

30

40

50

60

70

80

1 4 7 10 13 16 19 22 25 28 31 34 37

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]


For both cases, highest accuracy was obtained with NLPCA2, especially with a small numbers of features.

NLPCA2 shows better performance than 10-D original features using 10% of training data or more, and has similar performance with 39-D original features

20

30

40

50

60

70

80

1% 2% 5% 10% 25% 50% 100%Percentage of Training Data

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org (10-D)Org (39-D)NLPCA2

20

30

40

50

60

70

80

1% 2% 5% 10% 25% 50% 100%

Percentage of Training Data

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org (10-D)Org (39-D)NLPCA2

20

30

40

50

60

70

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org(Neu)

Org(MXL)

NLPCA2(Neu)

NLPCA2(MXL)

20

30

40

50

60

70

80

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org(Neu)

Org(MXL)

NLPCA2(Neu)

NLPCA2(MXL)

Using 50% of the training data, NLPCA2 performs substantially better than original features, at least for 12 or fewer features

Experiment 2 50% of the training data was used for training transformations

and a variable percentage, ranging from 1% to 100% of the other half of the training data, was used for training classifiers

Experiment 2 Results Classification accuracies of neural network (left) and MXL (right) classifiers

using 10% of classifier training data for training classifier

Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of classifier training data using 4 features

20

30

40

50

60

70

1 2 4 8 16 32Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]


20

30

40

50

60

70

1 2 4 8 16 32Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]


For both the neural network and MXL classifiers, NLPCA2 clearly performs much better than the other transformations or the original features.

20

30

40

50

60

70


Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org LDAPCA NLPCA1NLPCA2

20

30

40

50

60

70


Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org LDAPCA NLPCA1NLPCA2

NLPCA2 yields the best performance, with about 68% accuracy for both cases. Similar trends were also observed for 1, 2, 8, 16, and 32 features.

speech lab, ece, state university of new york at binghamton classification accuracies of neural...

Documents