minimum mean squared error time series classification using an echo state network prediction model...
TRANSCRIPT
![Page 1: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/1.jpg)
Minimum Mean Squared Error Time Series Classification Using an Echo
State Network Prediction Model
Mark Skowronski and John Harris
Computational Neuro-Engineering Lab
University of Florida
![Page 2: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/2.jpg)
Automatic Speech Recognition Using an Echo State Network
Mark Skowronski and John Harris
Computational Neuro-Engineering Lab
University of Florida
![Page 3: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/3.jpg)
Transformation of a graduate student
20062000
![Page 4: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/4.jpg)
Motivation: Man vs. Machine
Wall Street Journal/Broadcast news readings, 5000 words
Untrained human listeners vs. Cambridge HTK LVCSR system
![Page 5: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/5.jpg)
Overview
• Why is ASR so poor?• Hidden Markov Model (HMM)• Echo state network (ESN)• ESN applied to speech• Conclusions
![Page 6: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/6.jpg)
ASR State of the Art
• Feature extraction: MFCC vs. HFCC*
• Acoustic pattern rec: HMM
• Language models
*Skowronski & Harris. JASA, (3):1774–1780, 2004.
... m1 m2 m3 m4 m5 m6
frequency… coefficients
![Page 7: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/7.jpg)
Hidden Markov ModelPremier stochastic model of non-stationary time series used for decision making.
Assumptions:
1) Speech is piecewise-stationary process.
2) Features are independent.
3) State duration is exponential.
4) State transition prob. function of previous-next state only.
![Page 8: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/8.jpg)
ASR Example• Isolated English digits “zero” - “nine” from TI46:
8 male, 8 female, 26 utterances each, fs=12.5 kHz.
• 10 word models, various #states and #gaussians/state.
• Features: 13 HFCC, 100 fps, Hamming window, pre-emphasis (α=0.95), CMS, Δ+ΔΔ (±4 frames)
• Pre-processing: zero-mean and whitening transform
• M1/F1: testing; M2/F2: validation; M3-M8/F3-F8 training
• Test: corrupted by additive noise from “real” sources (subway, babble, car, exhibition hall, restaurant, street, airport terminal, train station)
![Page 9: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/9.jpg)
HMM Test Results
![Page 10: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/10.jpg)
Overcoming the limitations of HMMs
• HMMs do not take advantage of the dynamics of speech
• Well known HMM limitations include:– Only the present state affects transition
probabilities– Successive observations are independent– Assumes static density models
Need an architecture that better captures the dynamics of speech
![Page 11: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/11.jpg)
Echo State NetworkRecurrent neural network proposed by Jaeger 2001
L MI AN PE PA ER R
W
Win
dx
dy
Recurrent “reservoir” of nonlinear processing elements with random untrained weights.
Linear readout, easily trained weights.
Note similarities to Liquid State Machine
Wout
random untrained input weights.
![Page 12: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/12.jpg)
ESN Diagram & Equations
)()(
))()1(()(
nn
nnfn
out
in
xWy
uWxWx
![Page 13: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/13.jpg)
How to classify with predictors
Build 10 word models that are trained to predict the future of each of the 10 digits
Z-1
0
1
2
8
9
?
The best predictor determines the class
Not a new idea!
![Page 14: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/14.jpg)
ESN Training
• Minimize mean-squared error between y(n) and desired signal d(n).
1
1( ( ) ( ) ) ( ( ) ( ) )
out
T Tout n n n n
W R p
W x x x d
Wiener solution:
![Page 15: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/15.jpg)
Multiple Readout Filters
• Need good predictors for separation of classes
• One linear filter will give mediocre prediction.• Question: how to divide reservoir space and
use multiple readout filters?• Answer: competitive network of filters
• Question: how to train/test competitive network of K filters?
• Answer: mimic HMM.
],1[),()( Kknn kout
k xWy
![Page 16: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/16.jpg)
ASR Example
• Same spoken digit experiment as before.
• ESN: M=60 PEs, r=2.0, rin=0.1, 10 word models, various #states and #filters/state.
• Identical pre-processing and input features• Desired signal: next frame of 39-dimension
features
![Page 17: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/17.jpg)
ESN Results
![Page 18: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/18.jpg)
ESN/HMM Comparison
![Page 19: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/19.jpg)
Conclusions
• ESN classifies by predicting• Multiple filters mimic sequential nature of HMMs• ESN classifier noise robust compared to HMM:
– Ave. over all sources, 0-20 dB SNR: +21 percentage points
– Ave. over all sources: +9 dB SNR
• ESN reservoir provides a dynamical model of the
history of the speech.
Questions?
![Page 20: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering](https://reader036.vdocuments.us/reader036/viewer/2022062718/56649ead5503460f94bb4d39/html5/thumbnails/20.jpg)
HMM vs. ESN Classifier
HMM ESN ClassifierOutput Likelihood MSE
Architecture States, left-to-right States, left-to-right
Minimum element
Gaussian kernel Readout filter
Elements combined
GMM Winner-take-all
Transitions State transition matrix Binary switching matrix
Training Segmental K-means (Baum-Welch)
Segmental K-means
Discriminatory No Maybe, depends on desired signal