speech lab, ece, state university of new york at binghamton classification accuracies of neural...

1
Speech Lab, ECE, State University of New York at Binghamton Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of training data using NLPCA2 (10 features) and original features (10 and 39 features) Classification accuracies of original features and NLPCA2 reduced features with 2% (left) and 50% (right) of the training data Simulation of NLPCA1 Plot of input and output for semi-random 2-D data. The output data is reconstructed data using an NLPCA1 trained neural network with 1 hidden node An example with 3D data. Input and output plots of 3- D Gaussian data before and after using neural network with 2 hidden nodes Dimensionality Reduction of Speech Features Using Nonlinear Principal Components Analysis Stephen A. Zahorian, Tara Singh *, Hongbing Hu Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, USA * Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA Introduction Difficulties in automatic speech recognition Large dimensionality of acoustic feature spaces Significant load in feature training (“Curse of dimensionality”) Linear dimensionality reduction methods Principal Components Analysis (PCA) Linear Discriminant Analysis (LDA) Drawback of linear methods Can result in poor data representations The straight line fit to the data obtained by linear PCA does not accurately represent the original distribution of the data NLPCA Approaches Nonlinear Principal Components Analysis (NLPCA) Nonlinear transformation is applied to obtain a transformed version of the data for PCA Nonlinear transformation Two approaches (NLPCA1 and NLPCA2) were used for training the neural network ) ( x x (x): Transformed feature of the data point x for machine learning R M : M dimension feature space M D R R : (.) (.): A neural network mapping to obtain data more suitable for linear transformations NLPCA Approaches NLPCA1 The neural network is trained as an identity map Minimize mean square error using targets that are the same as the inputs Training with regularization is often needed to “guide” the network to a better minimum in error NLPCA2 The neural network is trained as classifier The network is trained to maximize discrimination Input Data Bottleneck neural network Dimensionalit y Reduced Data Experimental Evaluation Database Transformation methods compared Original features, LDA, PCA, NLPCA1 and NLPCA2 Classifiers Neural network and MXL (maximum likelihood Mahalanobis distance based Gaussian assumption classifier) Experiment 1 The same training data were used to train the transformations and the classifiers The number of features varied from 1 to 39 Variable percentages of training data (1%, 2%, 5%, 10%, 25%, 50% and 100%) were used Experiment 1 Results Classification accuracies of neural network (left) and MXL (right) classifiers with various types of features using all available training data (Figures on next column) NTIMIT database Target (vowels) /ah/, /ee/, /ue/, /ae/, /ur/, /ih/, /eh/, /aw/, /uh/, /oo/ Training data 31,300 tokens Testing data 11,625 tokens Feature 39 DCTC-DCS Conclusions The nonlinear technique minimizing mean square reconstruction error (NLPCA1) can be very effective for representing data which lies in curved subspaces, but does not appear to offer any advantages over linear dimensionality reduction methods for a speech classification task The nonlinear technique based on minimizing classification error (NLPCA2) is quite effective for accurate classification in low dimensionality spaces The reduced features appear to be well modeled as Gaussian features with a common covariance matrix Nonlinear PCA (NLPCA2) is much more effective than normal PCA for reducing dimensionality; however, with a “good” classification method, neither dimensionality reduction method improves classification accuracy. Acknowledgement This work was partially supported by JWFC 900 20 30 40 50 60 70 80 1 4 7 10 13 16 19 22 25 28 31 34 37 Num ber ofFeatures C lassfication A ccuracy [% ] Org LD A PCA NLPCA1 NLPCA2 20 30 40 50 60 70 80 1 4 7 10 13 16 19 22 25 28 31 34 37 Num ber ofFeatures C lassfication A ccuracy [% ] Org LD A PCA N LPC A1 N LPC A2 For both cases, highest accuracy was obtained with NLPCA2, especially with a small numbers of features. NLPCA2 shows better performance than 10-D original features using 10% of training data or more, and has similar performance with 39-D original features 20 30 40 50 60 70 80 1% 2% 5% 10% 25% 50% 100% Percentage ofTraining Data C lassfication A ccuracy [% ] O rg (10-D ) O rg (39-D ) N LPC A2 20 30 40 50 60 70 80 1% 2% 5% 10% 25% 50% 100% Percentage ofTraining Data C lassfication A ccuracy [% ] O rg (10-D ) O rg (39-D ) N LPC A2 20 30 40 50 60 70 Num ber ofFeatures Classfication A ccuracy [% ] O rg(Neu) Org(M XL) NLPCA 2(Neu) NLPCA 2(M XL) 20 30 40 50 60 70 80 Num ber ofFeatures C lassfication A ccuracy [% ] O rg(Neu) O rg(M XL) NLPCA 2(Neu) NLPCA 2(M XL) Using 50% of the training data, NLPCA2 performs substantially better than original features, at least for 12 or fewer features Experiment 2 50% of the training data was used for training transformations and a variable percentage, ranging from 1% to 100% of the other half of the training data, was used for training classifiers Experiment 2 Results Classification accuracies of neural network (left) and MXL (right) classifiers using 10% of classifier training data for training classifier Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of classifier training data using 4 features 20 30 40 50 60 70 1 2 4 8 16 32 Num ber ofFeatures C lassfication A ccuracy [% ] Org LD A PCA NLPCA1 NLPCA2 20 30 40 50 60 70 1 2 4 8 16 32 Num ber ofFeatures C lassfication A ccuracy [% ] Org LD A PCA N LPC A1 N LPC A2 For both the neural network and MXL classifiers, NLPCA2 clearly performs much better than the other transformations or the original features. 20 30 40 50 60 70 1% 2% 5% 10% 25% 50% 100% Percentage ofTraining Data C lassfication A ccuracy [% ] Org LD A PCA NLPCA1 NLPCA2 20 30 40 50 60 70 1% 2% 5% 10% 25% 50% 100% Percentage ofTraining Data C lassfication A ccuracy [% ] Org LD A PCA NLPCA1 NLPCA2 NLPCA2 yields the best performance, with about 68% accuracy for both cases. Similar trends were also observed for 1, 2, 8, 16, and 32 features.

Upload: gyles-wilcox

Post on 19-Jan-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various

Speech Lab, ECE, State University of New York at Binghamton

Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of training data using NLPCA2 (10 features) and original features (10 and 39 features)

Classification accuracies of original features and NLPCA2 reduced features with 2% (left) and 50% (right) of the training data

Simulation of NLPCA1

Plot of input and output for semi-random 2-D data. The output data is reconstructed data using an NLPCA1 trained neural network with 1 hidden node

An example with 3D data. Input and output plots of 3-D Gaussian data before and after using neural network with 2 hidden nodes

Dimensionality Reduction of Speech Features Using Nonlinear Principal Components Analysis Stephen A. Zahorian, Tara Singh*, Hongbing Hu

Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, USA* Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA

Introduction Difficulties in automatic speech recognition

Large dimensionality of acoustic feature spaces Significant load in feature training (“Curse of

dimensionality”)

Linear dimensionality reduction methods Principal Components Analysis (PCA) Linear Discriminant Analysis (LDA)

Drawback of linear methods Can result in poor data representations

The straight line fit to the data obtained by linear PCA does not accurately represent the original distribution of the data

NLPCA Approaches Nonlinear Principal Components Analysis (NLPCA)

Nonlinear transformation is applied to obtain a transformed version of the data for PCA

Nonlinear transformation

Two approaches (NLPCA1 and NLPCA2) were used for training the neural network

)(xx (x): Transformed feature of the data point x for machine learning

RM: M dimension feature space MD RR :(.)(.): A neural network mapping to obtain data more suitable for linear transformations

NLPCA Approaches

NLPCA1 The neural network is trained as an identity map

– Minimize mean square error using targets that are the same as the inputs– Training with regularization is often needed to “guide” the network to a better

minimum in error

NLPCA2 The neural network is trained as classifier

– The network is trained to maximize discrimination

Input Data

Bottleneck neural network

Dimensionality Reduced Data

Experimental Evaluation Database

Transformation methods compared Original features, LDA, PCA, NLPCA1 and NLPCA2

Classifiers Neural network and MXL (maximum likelihood Mahalanobis distance based Gaussian

assumption classifier)

Experiment 1 The same training data were used to train the transformations and the classifiers The number of features varied from 1 to 39 Variable percentages of training data (1%, 2%, 5%, 10%, 25%, 50% and 100%)

were used

Experiment 1 Results Classification accuracies of neural network (left) and MXL (right) classifiers with various types of features

using all available training data

(Figures on next column)

NTIMIT database

Target (vowels) /ah/, /ee/, /ue/, /ae/, /ur/, /ih/, /eh/, /aw/, /uh/, /oo/

Training data 31,300 tokens

Testing data 11,625 tokens

Feature 39 DCTC-DCS

Conclusions The nonlinear technique minimizing mean square

reconstruction error (NLPCA1) can be very effective for representing data which lies in curved subspaces, but does not appear to offer any advantages over linear dimensionality reduction methods for a speech classification task

The nonlinear technique based on minimizing classification error (NLPCA2) is quite effective for accurate classification in low dimensionality spaces

The reduced features appear to be well modeled as Gaussian features with a common covariance matrix

Nonlinear PCA (NLPCA2) is much more effective than normal PCA for reducing dimensionality; however, with a “good” classification method, neither dimensionality reduction method improves classification accuracy.

Acknowledgement This work was partially supported by JWFC 900

20

30

40

50

60

70

80

1 4 7 10 13 16 19 22 25 28 31 34 37

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

OrgLDAPCANLPCA1NLPCA2

20

30

40

50

60

70

80

1 4 7 10 13 16 19 22 25 28 31 34 37

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

OrgLDAPCANLPCA1NLPCA2

For both cases, highest accuracy was obtained with NLPCA2, especially with a small numbers of features.

NLPCA2 shows better performance than 10-D original features using 10% of training data or more, and has similar performance with 39-D original features

20

30

40

50

60

70

80

1% 2% 5% 10% 25% 50% 100%Percentage of Training Data

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org (10-D)Org (39-D)NLPCA2

20

30

40

50

60

70

80

1% 2% 5% 10% 25% 50% 100%

Percentage of Training Data

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org (10-D)Org (39-D)NLPCA2

20

30

40

50

60

70

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org(Neu)

Org(MXL)

NLPCA2(Neu)

NLPCA2(MXL)

20

30

40

50

60

70

80

Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org(Neu)

Org(MXL)

NLPCA2(Neu)

NLPCA2(MXL)

Using 50% of the training data, NLPCA2 performs substantially better than original features, at least for 12 or fewer features

Experiment 2 50% of the training data was used for training transformations

and a variable percentage, ranging from 1% to 100% of the other half of the training data, was used for training classifiers

Experiment 2 Results Classification accuracies of neural network (left) and MXL (right) classifiers

using 10% of classifier training data for training classifier

Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of classifier training data using 4 features

20

30

40

50

60

70

1 2 4 8 16 32Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

OrgLDAPCANLPCA1NLPCA2

20

30

40

50

60

70

1 2 4 8 16 32Number of Features

Cla

ssfi

cati

on

Acc

ura

cy [

%]

OrgLDAPCANLPCA1NLPCA2

For both the neural network and MXL classifiers, NLPCA2 clearly performs much better than the other transformations or the original features.

20

30

40

50

60

70

1% 2% 5% 10% 25% 50% 100%Percentage of Training Data

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org LDAPCA NLPCA1NLPCA2

20

30

40

50

60

70

1% 2% 5% 10% 25% 50% 100%Percentage of Training Data

Cla

ssfi

cati

on

Acc

ura

cy [

%]

Org LDAPCA NLPCA1NLPCA2

NLPCA2 yields the best performance, with about 68% accuracy for both cases. Similar trends were also observed for 1, 2, 8, 16, and 32 features.