neural network acoustic modelling across the...
TRANSCRIPT
![Page 1: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/1.jpg)
★NNNeural Network Acoustic Modelling
Across the Decades
Steve RenalsUniversity of Edinburgh
ICSI, 14 March 2015
![Page 2: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/2.jpg)
BDNN, CDNN, …Neural Network Acoustic Modelling
Across the Decades
Steve RenalsUniversity of Edinburgh
ICSI, 14 March 2015
![Page 4: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/4.jpg)
DNNDecades of Neural Networks
Steve RenalsUniversity of Edinburgh
ICSI, 14 March 2015
![Page 5: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/5.jpg)
So how was it in the early 90s?
![Page 6: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/6.jpg)
• Big neural networks trained as acoustic classifiers
So how was it in the early 90s?
![Page 7: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/7.jpg)
• Big neural networks trained as acoustic classifiers
• Use an HMM for (limited) sequence processing
So how was it in the early 90s?
![Page 8: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/8.jpg)
• Big neural networks trained as acoustic classifiers
• Use an HMM for (limited) sequence processing
• RNNs starting to look attractive as large-scale acoustic models
So how was it in the early 90s?
![Page 9: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/9.jpg)
• Big neural networks trained as acoustic classifiers
• Use an HMM for (limited) sequence processing
• RNNs starting to look attractive as large-scale acoustic models
• Worrying about speaker adaptation, modelling acoustic context, robustness, …
So how was it in the early 90s?
![Page 10: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/10.jpg)
• Big neural networks trained as acoustic classifiers
• Use an HMM for (limited) sequence processing
• RNNs starting to look attractive as large-scale acoustic models
• Worrying about speaker adaptation, modelling acoustic context, robustness, …
• Use vector processors to train networks quickly
So how was it in the early 90s?
![Page 11: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/11.jpg)
So how was it in the early 90s?
![Page 12: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/12.jpg)
So how was it in the early 90s?
![Page 13: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/13.jpg)
So how was it in the early 90s?
If Moore’s Law had applied to this, then by 2015: 16 billion weights
96 billion examples
![Page 14: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/14.jpg)
So how was it in the early 90s?
Not much has changed?
![Page 15: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/15.jpg)
So how was it in the early 90s?
![Page 16: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/16.jpg)
Context-dependentacoustic modelling
• Then• largely context-independent NNs• some context-dependence
• Now• large scale context-dependent NNs – CD classes
typically (not always) derived from HMM/GMM decision tree
![Page 17: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/17.jpg)
Context-dependentacoustic modelling (then)
![Page 18: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/18.jpg)
Context-dependentacoustic modelling (then)
![Page 19: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/19.jpg)
Context-dependentacoustic modelling (then)
![Page 20: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/20.jpg)
Context-dependentacoustic modelling (then)
![Page 21: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/21.jpg)
Context-dependentacoustic modelling (then)
![Page 22: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/22.jpg)
Context-dependentacoustic modelling (then)
![Page 23: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/23.jpg)
Context-dependentacoustic modelling (then)
![Page 24: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/24.jpg)
Context-dependentacoustic modelling (now)
3-8 hidden layers
~2000 hidden units
~6000 CD phone outputs
~2000 hidden units
9x39 MFCC/LogSpec
![Page 25: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/25.jpg)
Why model context-dependence?
• Divide and conquer the acoustic modelling space, if enough training data
• Context-sensitive adaptation of phone models to local acoustic/phonetic context – enhances discrimination of acoustically confusable phones
![Page 26: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/26.jpg)
Why model context-dependence?
• Divide and conquer the acoustic modelling space, if enough training data
• Context-sensitive adaptation of phone models to local acoustic/phonetic context – enhances discrimination of acoustically confusable phones
More important for GMM-based generative modelsthan neural networks?Neural networks focus on class boundaries ratherthan within-class structure
![Page 27: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/27.jpg)
Drawbacks of context-dependent phone models in neural networks
![Page 28: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/28.jpg)
Drawbacks of context-dependent phone models in neural networks• No distinction between different phones and the
same phone in different contexts• leads to hidden units learning discriminations that are not
useful• discriminations depend on particular state clustering
![Page 29: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/29.jpg)
Drawbacks of context-dependent phone models in neural networks• No distinction between different phones and the
same phone in different contexts• leads to hidden units learning discriminations that are not
useful• discriminations depend on particular state clustering
• Data sparsity
![Page 30: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/30.jpg)
Drawbacks of context-dependent phone models in neural networks• No distinction between different phones and the
same phone in different contexts• leads to hidden units learning discriminations that are not
useful• discriminations depend on particular state clustering
• Data sparsity
• How to avoid these problems?• factorised CDNN• sequence-level discriminative training
![Page 31: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/31.jpg)
Another way?Multitask CD and CI
• Multitask learning – learning CD and CI phones together
• Interpret as regulariser? Compare with pretraining
• unsupervised RBM
• discriminative monophone pretraining (cf curriculum learning)
• Can combine pretraining with multitask learning
6000CD
targets
41 monophone
targets
Acoustic features
![Page 32: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/32.jpg)
Another way?Multitask CD and CI
• Multitask learning – learning CD and CI phones together
• Interpret as regulariser? Compare with pretraining
• unsupervised RBM
• discriminative monophone pretraining (cf curriculum learning)
• Can combine pretraining with multitask learning
6000CD
targets
41 monophone
targets
Acoustic features
Joint workwith Peter Bell
![Page 33: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/33.jpg)
Experiment16
11
12
13
14
15
WER
/ %
Baseline Multitask
no pretrain
no pretrain
RBM pretrain
RBM pretrain
mono pretrain
mono pretrain
TED dev2010+tst2010+tst2011speaker adapted PLP features
![Page 34: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/34.jpg)
Experiment16
11
12
13
14
15
WER
/ %
Baseline Multitask
no pretrain
no pretrain
RBM pretrain
RBM pretrain
mono pretrain
mono pretrain
TED dev2010+tst2010+tst2011speaker adapted PLP features
5% relativeimprovement
![Page 35: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/35.jpg)
MLAN features
• Tandem/bottleneck features from OOD NN
• OOD net trained on Switchboard
OODinputs
OODCD
targets
In-domaininputs
6000CD
targets
41 monophone
targets
![Page 36: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/36.jpg)
Experiment16
11
12
13
14
15
WER
/ %
Baseline Multitask
no pretrain
no pretrainmono pretrain
mono pretrain
TED dev2010+tst2010+tst2011speaker adapted PLP+ MLAN features
![Page 37: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/37.jpg)
Experiment16
11
12
13
14
15
WER
/ %
Baseline Multitask
no pretrain
no pretrainmono pretrain
mono pretrain
TED dev2010+tst2010+tst2011speaker adapted PLP+ MLAN features
3% relativeimprovement
![Page 38: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/38.jpg)
![Page 39: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/39.jpg)
Practical Optimism
![Page 40: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/40.jpg)
Practical Optimism
3-8 hidden layers
~2000 hidden units
~6000 CD phone outputs
~2000 hidden units
9x39 MFCC/LogSpec
![Page 41: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/41.jpg)
Practical Optimism
3-8 hidden layers
~2000 hidden units
~6000 CD phone outputs
~2000 hidden units
9x39 MFCC/LogSpec
WEIGHT SHARING – adaptation, CNNs
ACTIVATION FUNCTIONS – pooling, RELU, gated units
ACOUSTIC INPUT – learning features?
TRAINING – objective function,
optimisation
ARCHITECTURES – convolutional, recurrent
![Page 42: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/42.jpg)
Closing thoughts• Morgan (and Hervé) set the framework within
which all the current HMM/NN acoustic modelling stuff resides
• What a good idea to build a supercomputer to do NN training at scale 25 years ago
• And quite a few of things that seem to have been forgotten are worth remembering
![Page 43: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve](https://reader031.vdocuments.us/reader031/viewer/2022020315/5acd0cb17f8b9ab10a8d2968/html5/thumbnails/43.jpg)
Closing thoughts• Morgan (and Hervé) set the framework within
which all the current HMM/NN acoustic modelling stuff resides
• What a good idea to build a supercomputer to do NN training at scale 25 years ago
• And quite a few of things that seem to have been forgotten are worth remembering
Thanks!