speech recognition 2 day 15 – sept 30, 2013 brain & language ling 4110-4890-5110-7960 nsci...
TRANSCRIPT
![Page 1: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/1.jpg)
SPEECH RECOGNITION 2DAY 15 – SEPT 30, 2013
Brain & Language
LING 4110-4890-5110-7960
NSCI 4110-4891-6110
Harry Howard
Tulane University
![Page 2: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/2.jpg)
2
Course organization• The syllabus, these slides and my recordings are
available at http://www.tulane.edu/~howard/LING4110/.• If you want to learn more about EEG and neurolinguistics,
you are welcome to participate in my lab. This is also a good way to get started on an honor's thesis.
• The grades are posted to Blackboard.
9/30/13 Brain & Language, Harry Howard, Tulane University
![Page 3: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/3.jpg)
REVIEW
9/30/13 Brain & Language, Harry Howard, Tulane University 3
![Page 4: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/4.jpg)
4
ReviewPitch shows fundamental frequency (F0)
Spectrogram shows formants (F1-3)
Sound wave
9/30/13 Brain & Language, Harry Howard, Tulane University
![Page 5: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/5.jpg)
SPEECH RECOGNITIONIngram §5
9/30/13 Brain & Language, Harry Howard, Tulane University 5
![Page 6: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/6.jpg)
6
• use Praat in class
9/30/13 Brain & Language, Harry Howard, Tulane University
![Page 7: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/7.jpg)
Brain & Language, Harry Howard, Tulane University 79/30/13
Vowel articulation• Tongue height: high, (mid), low
• put your hand under your jaw and say the vowel of:• mat, met, mate, mitt, meat• meat, mitt, mate, met, mat
• Tongue advancement: front, central, back• Lip configuration: rounded, neutral, retracted
![Page 8: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/8.jpg)
Brain & Language, Harry Howard, Tulane University 89/30/13
Vowel description
Front Central Back
Highi
ɪu
ʊ
(Mid)
e
ɛ
ɝə
ɚ
ʌ
o
ɔ
Lowæ a
Retracted Neutral Rounded
![Page 9: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/9.jpg)
Brain & Language, Harry Howard, Tulane University 9
Sample vowel spectrograms
9/30/13
• Wide band spectrograms of the vowels of American English in a /b__d/ context. • Top row, left to right: [i, ɪ, eɪ, ɛ, æ]. Bottom row, left to right: [ɑ, ɔ, o, ʊ, u].
![Page 10: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/10.jpg)
10
Acoustic cues and distinctive features
• Three problemsa. Input signal
b. Internal representation
c. Interface between (a)and (b)
• Lexical information retrieval• but we only need the
phonological form of a lexical item
9/30/13 Brain & Language, Harry Howard, Tulane University
![Page 11: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/11.jpg)
11
Why speech recognition is difficult• The segmentation problem• The variability problem
• coarticulation
• The speaking environment• Speakers’ vocal tracts• Speech rate and style• Rate of information transmission
9/30/13 Brain & Language, Harry Howard, Tulane University
![Page 12: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/12.jpg)
12
Lexical retrieval• Speech perception involves phonological parsing prior to
lexical access• It is not enough to know the lexicon beforehand.
• Phonetic forms and phonological representations• Speech/speaker normalization• Distinctive features and acoustic cues• Underspecified vs. fully specified• Discrete vs. continuous• Hierarchical organization vs. entrainment
9/30/13 Brain & Language, Harry Howard, Tulane University
![Page 13: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649cc25503460f94989b9b/html5/thumbnails/13.jpg)
NEXT TIMEFinish Ingram §6.
☞ Go over questions at end of chapter.
9/30/13 Brain & Language, Harry Howard, Tulane University 13