Download - Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus
![Page 1: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/1.jpg)
Speech synthesis based on a limited speech
corpusRudy Marsman | VU University | NISV
![Page 2: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/2.jpg)
Netherlands Institute for Sound and Vision (NISV) | Beeld & Geluid
![Page 3: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/3.jpg)
Beeld en Geluid
• collects, preserves and opens the Dutch audiovisual heritage for as many users as possible• one of the largest audiovisual archives in Europe. The
institute manages over 70 percent of the Dutch audiovisual heritage• Was interested in ways to re-use old Polygoonjournaals
footage• Text-To-Speech engine based on Philip Bloemendal
![Page 4: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/4.jpg)
Philip Bloemendal
• Famous anchorman• Iconic voice• https://www.youtube.com/watch?v=31tClHJ2tfQ
![Page 5: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/5.jpg)
Research
• Can the current corpus of audio recordings of Bloemendal be used to construct a TTS engine?• How large percentage of the Dutch language can be constructed
with the current corpus?• What can we do to improve?• How well is the text-to-speech engine recognizable as Philip
Bloemendal?• How well comprehensive are the constructed audiofiles?
![Page 6: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/6.jpg)
How large percentage of the Dutch language can be constructed with the current corpus?
• Constructing the corpus• How many ‘Polygoonjournaals’ • Openbeelden – OAI (Open Archives Initiative)• Extract audio• Speech analysis – roughly 35000 distinct words • XML files
• Evaluation• Metrics• Corpora• Language changes
![Page 7: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/7.jpg)
How large percentage of the Dutch language can be constructed with the current corpus?
• Approach: 4 corpora to test against• Contemporary news articles (same domain, different time) | 50
articles• News articles from the 1970s (same domain, time) | 50 articles• E-books (different domain, various times) |6 books• Tweets (different domain, different time) | 1000 tweets
• Evaluation• Number of distinct words• Number of sentences
![Page 8: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/8.jpg)
What can we do to improve performance?
• It is to be expected that many (contemporary) words have not been pronounced by Philip• Various approaches
• Change format (Lowercase, diareses)• Numbers• Finding synonyms• Decompounding
![Page 9: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/9.jpg)
Finding Synonyms
• Open Dutch Wordnet: Dutch lexical semantic database• Maarten Postma et al.• Yields synsets (e.g. Hoofdmeester -> Rector, Schoolhoofd)• Computationally expensive
![Page 10: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/10.jpg)
Decompounding
• Dutch language allows for compounding words• School, hoofd -> Schoolhoofd• Regen, water -> regenwater• Staat, hoofd -> StaatShoofd
• Each word is distinct in the corpus• Decompounding is computationally expensive• Computationally expensive for large corpora, long words• Constructed Bigrams and Trigrams
![Page 11: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/11.jpg)
Results (words)
Dataset Unique words
Unique words found
After synsets After decompounding
Contemporary news
2743 2019 2106 2448
Old news 16191 7703 8261 11541Tweets 27180 7692 8446 13440Books 26575 11440 12922 20207
![Page 12: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/12.jpg)
Results (sentences)
Dataset Unique sentences
Unique sentences found
After synsets After decompounding
Contemporary news
1022 106 110 186
Old news 2626 183 190 301Tweets 8937 174 181 296Books 56106 9387 11385 18271
![Page 13: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/13.jpg)
How comprehensible / recognizable are sentences• 8 people tested the software• Philip was recognized (or ‘that news guy’)• Words with more consonants were easier to recognize• When user input their own sentences, more recognition• When sentences were demonstrated without subtitles, less• Speed of software / GUI limited testing capabilities
![Page 14: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/14.jpg)
The use of Deep Neural Networks in colorizing
videoRudy Marsman | VU University | NISV
![Page 15: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/15.jpg)
Neural Networks
• Recent progress in computational power made implementation of Deep Neural Nets possible• Neural Net trained on large training set can accurately
make predictions in real-world examples
![Page 16: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/16.jpg)
Zhang et al.
• Richard Zhang et al. trained a neural net to colorize images• Trained on over a million images• Fools humans into thinking colorized photo is original 20%
of time• Resizes image to fit input layer of 200x200 pixels• Gained popularity in news website / forums
![Page 17: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/17.jpg)
Zhang et al.
![Page 18: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/18.jpg)
Implementation on video
• Extract individual frames from video using FFMPEG• Colorize each individual frame• Re-compile video and attach original audio file
![Page 19: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/19.jpg)
Example
• https://www.youtube.com/watch?v=olsO2rOy_i4
![Page 20: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/20.jpg)
Applications
• Colorized videos are more ‘tangible’ and ‘alive’ than black/white• Showing colorized Polygoonjournaals can augment TTS
engine• General positive responses on technology may increase
attention to NISV collection• NISV Employees were enthousiastic
![Page 21: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/21.jpg)
Issues
• Each frame is considered independent and is colorized thusly• Artifacts appear between frames• Slow performance without use of Nvidia GPU• Low resolution• Predicted colors still far from perfect
![Page 22: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/22.jpg)
Conclusions
• Current corpus covers many of often used words• Various implemented approacheds increase coverage• Low coverage for sentences -> real world approach may
need improvement• Audio is recognizable and understandable• Neural Networks may be used to colorize video footage
![Page 23: Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus](https://reader035.vdocuments.us/reader035/viewer/2022070510/58a8bedc1a28abbd6b8b6dff/html5/thumbnails/23.jpg)
Discussion