open source toolkit for speech to text...
TRANSCRIPT
![Page 1: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/1.jpg)
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Institute for Antrophomatics
www.kit.edu
Open Source Toolkit for Speech to Text TranslationThomas Zenkel, Matthias Sperber, Jan Niehues, Markus Müller, Ngoc-
Quan Pham, Sebastian Stüker, Alex Waibel
![Page 2: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/2.jpg)
Institute for Anthropomatics2 16.08.18 Jan Niehues - S2T Translation
Motivation
• Speech translation interesting challenge• Neural models• End-to-End models
• Provide a baseline• Cascade of several models
• Easy to extend• Develop models for part
• Easy to use• Download pretrained models
![Page 3: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/3.jpg)
Institute for Anthropomatics3 16.08.18 Jan Niehues - S2T Translation
Cascade Spoken Language Translation
• Serial combination of severalmodels
• ASR• Audio → Text
• Segmentation• Add case information• Add punctuation information
• Machine translation• Source language →
target language
![Page 4: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/4.jpg)
Institute for Anthropomatics4 16.08.18 Jan Niehues - S2T Translation
CTC-based ASR
• Input:• 40 dimensional Mel-filterbank features
• Output:• Byte-pair units (300 or 10000)
• Model:• 4-layer Bi-LSTM• Softmax layer
• Trained using CTC loss function
![Page 5: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/5.jpg)
Institute for Anthropomatics5 16.08.18 Jan Niehues - S2T Translation
Encoder-Decoder Based ASR
• XNMT-based implementation• Input:
• 40 dimensional Mel-filterbank features
• Encoder:• 4-layer bidirectional pyramidal
encoder• Decoder:
• One-layer bidirectional decoder
![Page 6: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/6.jpg)
Institute for Anthropomatics6 16.08.18 Jan Niehues - S2T Translation
Segmentation and Punctuation
• Monolingual machine translation system• Add punctuation and case
• Example:• Input:
• i felt wor@@ se why i wro@@ te a who@@ le book
• Output:• U L L. U? U L L L L• I felt worse. Why? I wrote a whole book
• Preprocessing:• Randomly split training data and
remove punctuation information
• OpenNMT-based model
![Page 7: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/7.jpg)
Institute for Anthropomatics7 16.08.18 Jan Niehues - S2T Translation
Machine Translation
• OpenNMT-based model• RNN-based Encoder and Decoder
• Preprocessing:• Tokenizer• Byte-pair encoding
• Mid-size model:• Pre-training on all data• Adaptation to in-domain data using
continue training
![Page 8: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/8.jpg)
Institute for Anthropomatics8 16.08.18 Jan Niehues - S2T Translation
Data
• Scripts to download and preprocessdefault data
• Audio:• TED LIUM corpus
• Text:• Small model:
• WIT corpus• Midsize model:
• EPPS corpus• WIT corpus
![Page 9: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/9.jpg)
Institute for Anthropomatics9 16.08.18 Jan Niehues - S2T Translation
Results
• Evaluation tool to calculate 4 metrics provided• BLEU, TER, CharacTER, BEER• Automatic re-segmentation
Model dev2010 tst2010 tst2013 tst2014
Attention 13.42 13.57 12.04 11.88
CTC 300 12.33 11.88 12.47 11.49
CTC 10K 13.04 13.44 13.41 12.58
Rover 13.98 14.08 13.73 13.23
![Page 10: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues](https://reader030.vdocuments.us/reader030/viewer/2022041222/5e0bc480de1404320538c5b8/html5/thumbnails/10.jpg)
Institute for Anthropomatics10 16.08.18 Jan Niehues - S2T Translation
Conclusion
• Combination of several toolkits to build full speech translation toolkit
• Easy usage:• Dockerized
• Applications• Apply pre-trained models• Train models using provided data (IWSLT)• Train models on own data
• Link:• https://github.com/isl-mt/SLT.KIT