brad story speech, language, and hearing sciences university of arizona research supported by nih...

25
Brad Story Speech, Language, and Hearing Sciences University of Arizona Research supported by NIH R01-04789 Advances in Speech Synthesis ASA Fall 2009 – San Antonio, TX, 10.27.09 Advances in simulation of sentence-level speech production with kinematic models of the vocal tract and vocal folds

Upload: angelo-holdcroft

Post on 30-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Slide 2 Brad Story Speech, Language, and Hearing Sciences University of Arizona Research supported by NIH R01-04789 Advances in Speech Synthesis ASA Fall 2009 San Antonio, TX, 10.27.09 Advances in simulation of sentence-level speech production with kinematic models of the vocal tract and vocal folds Slide 3 Goal: Develop a model to facilitate understanding the human sound production system- nonlinear interaction of source and filter (vocal tract) acoustics produced by the vocal folds & vocal tract : -anatomic/physiologic scaling -time-varying changes perceptual response to sounds produced by the system Brad Story, U. of Arizona Slide 4 Coordinated movement Speech Production Speech Perception Implication: Model must produce intelligible speech at the syllable, word or sentence level To some degree, the model should allow access to three levels: Brad Story, U. of Arizona Slide 5 Replicate Natural Speech Fleshpoint tracking: temporal patterns of vocal tract shape change, constriction locations, to model parameters Tracking acoustic characteristics: acoustic characteristics to parameters for vocal fold vibration and vocal tract shape. U. Wisconsin X-ray microbeam The black cat F1 F2 F3 Frequency (Hz) 1000 2000 3000 Time Brad Story, U. of Arizona Slide 6 2. Vocal Tract 1D tubular waveguide Trachea Voice source* *Titze, I.R. (2006). The myoelastic aerodynamic theory of phonation, NCVS, pp. 197-214. Parts of the Model 1. Kinematic vocal fold model: Driven motion of the medial surfaces; glottal flow is interactive with vocal tract pressures Brad Story, U. of Arizona Slide 7 1. Model of Vocal Fold Kinematics Adductory maneuver + vibrational displacement Slowly-varying postural component Vibrational displacement Medial surfaces of the vocal folds Glottal width L T Brad Story, U. of Arizona Slide 8 1. Model of Vocal Fold Kinematics Adductory maneuver + vibrational displacement Slowly-varying postural component Vibrational displacement Glottal area Medial surfaces of the vocal folds Glottal width Brad Story, U. of Arizona Slide 9 Glottal area Medial surfaces of the vocal folds Example: Typical vibration Brad Story, U. of Arizona Slide 10 Valving : modulate vowels with constrictions - consonants Three Categories of Vocal Tract Movements Shaping : slowly-varying changes to the shape of the entire vocal tract - vowels Tuning : modify parts of the vocal tract shape to enhance voice quality or facilitate voice production 2. Model of Time-Varying Vocal Tract Shape Gracco, V.L., (1992). Perkell, J. (1969). Ohman, S. E. G. (1966;1967). Brad Story, U. of Arizona Slide 11 TubeTalker* *Story, (2005). JASA, 117, 3231-3254 Tier I: Overall vocal tract shaping Hierarchical control tiers Tier II: Valving Composite time-varying vocal tract Brad Story, U. of Arizona Slide 12 1 1 2 2 Average (mean) VT shape Tier I (Shaping): Derived from principal component analysis of vocal tract shapes for a collection of vowels from a specific speaker [F1, F2] vowel space Story and Titze, (1998). J. Phonetics; Story, (2005), JASA Brad Story, U. of Arizona Slide 13 Tier I (Shaping): Mapping of [F1,F2] frequencies to model coefficients Interpolate between trajectories The black cat [F1, F2] vowel space [q1, q2] coeff space Brad Story, U. of Arizona Slide 14 Tier I: Vowel transitions Overall shape changes The black cat Postural component Vocal tract deformation due to articulation Brad Story, U. of Arizona Slide 15 Time-varying formant frequencies (continuous) Tier I: modulation of the overall shape of the vocal tract The black cat vowel transitions only F3 F2 F1 Brad Story, U. of Arizona Slide 16 Vocal Fold and Respiratory Parameters Fundamental Frequency Separation of the vocal folds at the vocal processes Respiratory pressure The black cat Brad Story, U. of Arizona Slide 17 Tier II (consonant valving): requires specification of constriction location, degree of closure, and temporal characteristics Constriction location/degree Time course of the constriction Brad Story, U. of Arizona Slide 18 Constriction location and timing U. Wisconsin X-ray microbeam Movement in the midsagittal plane Time-varying cross-distance The black cat Brad Story, U. of Arizona Slide 19 Tier I - Shaping vowel shapes are imposed on neutral vocal tract Tier II - Valving consonant perturbations are imposed on the vowel substrate The black cat Brad Story, U. of Arizona Slide 20 Shaping + Valving = composite time-varying vocal tract shape Time-varying formant frequencies F3 F2 F1 The black cat vowel transitions only vowel transitions + consonant perturbations Brad Story, U. of Arizona Slide 21 Spectrographic Comparison: Simulated/Natural Simulated Natural Brad Story, U. of Arizona Slide 22 The black cat Modification: change constriction location The black bat Brad Story, U. of Arizona Slide 23 The black cat The black gnat Modification: change constriction location and open nasal port = nasal port open Brad Story, U. of Arizona Slide 24 black cat black bat black gnat Brad Story, U. of Arizona Slide 25 Black cat vowels Black cat vowels w/adduct Black cat normal Black cat widened epi larynx Black cat constricted epi larynx Black cat long duration, tremor Black cat altered F0 contour Black cat short VT, high F0 Black cat shortened VT, increased F0 Black cat Halloween voice Variations of the phrase Brad Story, U. of Arizona Slide 26 The End Operation Black Cat Brad Story, U. of Arizona