speech articulation assessment using dynamic magnetic

6
1 INTRODUCTION 1.1 Speech production analysis and challenges The speech production mechanism is a complex hu- man motor activity that is able to achieve voice modulation and produce speech based mainly in the articulators’ movements. The organs involved, most- ly formed of soft tissues, such as the tongue, the lips, the velum and the pharynx, assume extremely im- portant roles during speech production. In fact, these organs together with some bones, i.e. the palate and the jaw, modify the resonance cavities and the shape of the vocal tract in order to produce the sounds. The human vocal tract’s shape (Figure 1) is different among subjects and presents a non-regular contour defined by the air-soft tissues’ boundaries. This tube extends from the lips to the glottis, and is formed by four main structures: the oral cavity, the nasal cavi- ty, the velum and the pharynx. The tongue is the most important articulator, mainly because it is the largest one, and performs a wide range of slow and fast movements during speech production. Many approaches have been used to track and ob- serve the movements of the articulators, in particular of the tongue, but most of them employ sensors (e.g. electromagnetic articulography) or the direct contact with the tongue and the palate (e.g. electropalatog- raphy). Figure 1. The shape of the vocal tract during the production of [ɛ] vowel in an image acquired by a 3.0 Tesla MR system. Magnetic Resonance Imaging (MRI) has been suc- cessfully applied on real-time analysis of the articu- lators during speech production, along the whole vo- cal tract, with good signal-to-noise ratio and without ionizing effects. As such, several MRI techniques provided the calculation of cross-sectional areas and volumes directly from static postures during sus- tained articulations (Kim et al. 2009) or using multi- ple repetitions (Stone et al. 2001, Shadle et al. 1999). Because speech dynamic events need a minimal sampling rate and considering the number of articu- lators involved, two challenges are demanded for the accurate examination, namely: Speech Articulation Assessment using Dynamic Magnetic Resonance Imaging Techniques S. R. Ventura School of Allied Health Science – Porto Polytechnic Institute, V.N. Gaia, Portugal M. J. M. Vasconcelos Faculty of Engineering, University of Porto, Porto, Portugal D. R. Freitas Faculty of Engineering, University of Porto, Porto, Portugal I. M. Ramos Radiology Service, St. John Hospital and Faculty of Medicine, University of Porto, Porto, Portugal João Manuel R. S. Tavares Faculty of Engineering, University of Porto, Porto, Portugal ABSTRACT: Magnetic Resonance Imaging (MRI) has been successfully applied on real-time analysis of the articulators during speech production along the whole vocal tract, with good signal-to-noise ratio and without ionizing effects. Because speech dynamic events need a minimal sampling rate, an improvement on the tem- poral resolution of MRI systems is demanded. Our aim is to describe a dynamic MRI technique to acquire and assess the main articulatory events during the production of some European Portuguese utterances. Hence, novel perceptions for dynamic MRI technique using a 3.0 Tesla System are presented in order to study the shape of the vocal tract during speech production. KEYWORDS: Image analysis, Medical imaging, Speech production, Dynamic techniques.

Upload: others

Post on 12-Feb-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech Articulation Assessment using Dynamic Magnetic

1 INTRODUCTION

1.1 Speech production analysis and challenges The speech production mechanism is a complex hu-man motor activity that is able to achieve voice modulation and produce speech based mainly in the articulators’ movements. The organs involved, most-ly formed of soft tissues, such as the tongue, the lips, the velum and the pharynx, assume extremely im-portant roles during speech production. In fact, these organs together with some bones, i.e. the palate and the jaw, modify the resonance cavities and the shape of the vocal tract in order to produce the sounds. The human vocal tract’s shape (Figure 1) is different among subjects and presents a non-regular contour defined by the air-soft tissues’ boundaries. This tube extends from the lips to the glottis, and is formed by four main structures: the oral cavity, the nasal cavi-ty, the velum and the pharynx. The tongue is the most important articulator, mainly because it is the largest one, and performs a wide range of slow and fast movements during speech production. Many approaches have been used to track and ob-serve the movements of the articulators, in particular of the tongue, but most of them employ sensors (e.g. electromagnetic articulography) or the direct contact

with the tongue and the palate (e.g. electropalatog-raphy).

Figure 1. The shape of the vocal tract during the production of [ɛ] vowel in an image acquired by a 3.0 Tesla MR system. Magnetic Resonance Imaging (MRI) has been suc-cessfully applied on real-time analysis of the articu-lators during speech production, along the whole vo-cal tract, with good signal-to-noise ratio and without ionizing effects. As such, several MRI techniques provided the calculation of cross-sectional areas and volumes directly from static postures during sus-tained articulations (Kim et al. 2009) or using multi-ple repetitions (Stone et al. 2001, Shadle et al. 1999). Because speech dynamic events need a minimal sampling rate and considering the number of articu-lators involved, two challenges are demanded for the accurate examination, namely:

Speech Articulation Assessment using Dynamic Magnetic Resonance Imaging Techniques

S. R. Ventura School of Allied Health Science – Porto Polytechnic Institute, V.N. Gaia, Portugal

M. J. M. Vasconcelos Faculty of Engineering, University of Porto, Porto, Portugal D. R. Freitas Faculty of Engineering, University of Porto, Porto, Portugal I. M. Ramos Radiology Service, St. John Hospital and Faculty of Medicine, University of Porto, Porto, Portugal

João Manuel R. S. Tavares Faculty of Engineering, University of Porto, Porto, Portugal ABSTRACT: Magnetic Resonance Imaging (MRI) has been successfully applied on real-time analysis of the articulators during speech production along the whole vocal tract, with good signal-to-noise ratio and without ionizing effects. Because speech dynamic events need a minimal sampling rate, an improvement on the tem-poral resolution of MRI systems is demanded. Our aim is to describe a dynamic MRI technique to acquire and assess the main articulatory events during the production of some European Portuguese utterances. Hence, novel perceptions for dynamic MRI technique using a 3.0 Tesla System are presented in order to study the shape of the vocal tract during speech production.

KEYWORDS: Image analysis, Medical imaging, Speech production, Dynamic techniques.

Page 2: Speech Articulation Assessment using Dynamic Magnetic

a) The development of a specific trigger for image acquisition to improve the temporal resolution of MRI systems; b) Audio-recording in real-time during MRI acquisi-tions allowing sound acoustic analysis and imaging relationship. Our aim is to describe a dynamic MRI technique to acquire and assess the main articulatory events dur-ing the production of some European Portuguese (EP) sounds. The remaining of this paper is organized as follows. In the next section, the description of the MRI protocol, the speech corpus and the image analysis and assessment are described. Then, the articulatory measurements obtained for two subjects by using deformable models are presented and discussed. Finally, the conclusions and future outlooks are pointed out in the last section.

1.2 Dynamic MRI techniques A few methods and applications of dynamic MRI have been presented concerning synchronized sam-pled method (Parthasarathy et al. 2007) or tagging technique (Stone et al. 2001), and achieving images at rates of 7 to 10 frames per second (Demolin et al. 2006, Stone et al. 2001, Mády et al. 2001, Engwall 2004) and even of 18 (Parthasarathy et al. 2007) and 24 frames per second (Narayanan et al. 2004). Ac-cording to Shadle (1999), dynamic MRI is a poten-tially useful tool that allows the tracking of the vocal tract organs’ movements. In addition, several mor-phological aspects can be studied as, for example, the motion of the tongue’s surface or contour (Stone et al. 2001), the kinematic parameters of the tongue (i.e. velocity, principal strains) (Parthasarathy et al. 2007), the shape of the tongue (Avila-García et al. 2004) and characteristic distances (i.e. articulatory parameters) (Ventura et al. 2011, Echternach et al. 2010). In this area, dynamic MRI can also allow the assessment of articulatory impairments following surgery to structures of the oral cavity, such as for cancer treatment (Mády et al. 2001). In a previous work, we presented a technique for the dynamic study of the vocal tract with MRI by using the heart’s beat signal to synchronize and trigger the imaging acquisition process (Ventura et al. 2011). Our previous dynamic study revealed the existence of significant variability in sound productions among subjects. This variability is not only due to individual anatomic differences, but also to the pe-culiarities of each subject’s movement and gesture control that was considered as being extremely indi-vidualized as was duly observed. Due to the developments that have occurred in MRI, namely by the use of 3.0 Tesla magnetic fields, new

applications and image refinements are expected, and consequently significant improvements on the quality of the data acquired with the articulatory events during speech production. Because speech dynamic events demand a minimal sampling rate around 20 Hz (Narayanan et al. 2004), an improve-ment on the temporal resolution of MRI systems is demanded, for example, by using k-space sampling strategies or more efficient triggering techniques.

1.3 Vocal tract modeling The implementation of statistical methods to analyse data from speech production has proofed to be valu-able in several studies. To name a few, (Harshman et al. 1977) used component analysis to identify a set of articulatory features of the tongue. Later, Maeda (1988) applied factor analysis in order to describe the lateral shapes of the vocal tract and (Stone et al. 1997) employed principal component analysis to ex-amine the sagittal tongue’s contours from ultrasound images. The MRI of the vocal tract, associated with the use of statistical deformable models, has made possible the automatic extraction of the vocal tract’s shape from the acquired images and the achievement of ar-ticulatory measurements that can be useful in the improving of computational speech models (Vasconcelos et al. 2010a, b). 2 METHODS

2.1 Equipment, subjects and speech corpus The image data was acquired using a MAGNETOM Trio 3.0 Tesla MR system and two integrated coils (a 32-channel head coil and a 4-channel neck matrix coil), with the subjects in supine position. According to the safety procedures for MRI, a ques-tionnaire was performed for screening several con-traindications. In addition, the subjects were previ-ously informed and instructed about the study to be performed and the informed consent was obtained. Two young female volunteers, without articulatory disorders, were trained before the MRI exam to en-sure the proper production of the intended sounds, and audio recordings were performed before image acquisition and in supine position. The speech corpus consisted in two sequences of sounds of EP language, in two different articulatory contexts: i) Vowel-vowel articulation (VV); ii) Set of consonant-vowel (CV) articulation during a word utterance. The first articulatory context included the five oral vowels [a ɛ i ɔ u] and the second, the utterance word /pato/ (the English word “duck”, IPA phonetic

Page 3: Speech Articulation Assessment using Dynamic Magnetic

transcription [patu]). This choice was made consid-ering the sounds familiarity, to any Portuguese sub-ject, during the speech production, and because these sounds are easy to articulate. Furthermore, a coartic-ulation study was assessed, in the second utterance, stops on consonant-vowel context. The two audio spectrograms in Figures 2 and 3 illus-trate the two sequences of sounds of EP language recorded from the first subject. As can be verified, the vowels sequence utterance has a total duration of 5.05 seconds, and the word /pato/ lasts 2.41 seconds.

Figure 2. Spectrogram for the first articulatory context in PRAAT software.

Figure 3. Spectrogram for the second articulatory context in PRAAT software.

2.2 Dynamic MRI technique Concerning the data acquisition, two approaches were used: 1) firstly, rapid pulse sequences and, then, 2) a tagging technique. “Tagging” is a method that aids the tracking of objects’ motion in MRI se-ries. The inserted tags appear as dark regions in the images that move associated to the object under analysis. As such, this technique is particularly valu-able in cardiac imaging, as the tissue of the heart’s walls provides few natural features for motion track-ing. For speech articulatory assessment, this tech-nique is very difficult to use and more tests must be performed in order to improve temporal resolution and image quality intended for this purpose. Hence, only rapid pulse sequences combined with parallel imaging were used and tested, through the compromise of the temporal resolution and signal-noise ratio of the MRI system.

Using a Flash Gradient-Echo Sequence, 100 midsag-ittal WT1 slices were acquired during 48 seconds for each repeated utterance. The MRI protocol parame-ters used are indicated in Table 1. Table 1. MRI protocol parameters used in the dynamic study about speech production.

For the target utterance constituted by the five oral vowels, each sound occurred at least 13 to 37 times per sequence, according to the speed of speech and on the word length. The vowel [a] occurred 27 and 37 times per sequence for each subject. The vowels [ɔ u] were the sounds with lower occurrence rates. For the second target word, each sound occurred at least 13 to 42 times per sequence, occurring for each subject the vowel [a] about 35 to 42 times and 13 to 16 times for the plosive consonant [t].

2.3 Deformable models In order to perform a more robust analysis of the ar-ticulatory behavior during the speech production, statistical Point Distribution Models (PDMs) (Vasconcelos et al. 2008, 2010a) were used to auto-matically identify, i.e. segment, the vocal tract’s key points in order to compute descriptive measures. In the building of the PDMs (Cootes et al. 1992), the manual tracing of the key points was carried out by one of the authors with medical imaging knowledge and was realized on images sequentially displayed on the computer screen and later cross-checked by another author. The labeling method was performed according to the anatomic location of the vocal tract articulators (Figure 2). In the statistical modelling method used, all the training examples are aligned into a standard co-ordinate frame and a Principal Component Analysis is applied to the co-ordinates of the landmark points. This produces the mean position for each landmark, and a description of the main ways in which these points tend to move together (Vasconcelos et al. 2008, 2010a). The local grey-level behavior of each landmark point can also be considered in the modeling of a shape (Cootes et al. 1993). Thus, statistical infor-

Parameters 2D Dynamic imaging technique TR (msecond) 6.4 TE (msecond) 2.44 Flip Angle 10º Number of averages 1 Slice thickness (mm) 6 Field of view (mm) 178 x 220 Matrix 156 x 192 Acceleration factor (parallel imaging)

4

Image resolution 0.873 Pixel per mm Pixel spacing 1.146 x 1.146 mm

Page 4: Speech Articulation Assessment using Dynamic Magnetic

mation is obtained about the mean and covariance of the grey values of the pixels around each landmark point. This information is used to construct the ap-pearance models in Active Appearance Models that can be used to identify the modeled shape in new images. Active Appearance Models were presented in (Coot-es et al. 1998) and allow the building of texture and appearance models. These models are generated by combining a model of shape variation (a geometric model), with a model of the appearance variations in a shape-normalized frame. To identify the modelled shape in new images, the method uses the difference between the current estimate of appearance and the target image to drive an optimization process.

2.4 Image analysis and articulatory assessment A total of 200 midsagittal MR images were acquired for each subject. Speech articulatory events were de-scribed considering seven distances measurements, as depicted in Figure 4. These measures represent the union of the major articulation points for conso-nants production (because vowels are produced without closure of the vocal tract) and pointed all ar-ticulatory (supraglottic) organs, as the lips, the tongue, the palate, the velum, the pharynx. In order to automatically extract the landmarks of the images, active appearance models were built for each sequence of sounds for each subject (Cootes, 2004). The models were built from the first 20 im-ages of each MRI sequence and later used to auto-matically label the others 80 images of the sequence.

Figure 4. Articulatory measurements addressed on MR images during speech production. For each subject, MRI data measurements were per-formed only when an articulatory posture occurred (based on visual assessment of the vocal tract’s shape) and, to avoid errors in the analysis and auto-matic labeling process, rest positions (without speech activity) were excluded.

3 RESULTS This study presents data concerning the vocal tract’s shape during the utterance of sounds of two articula-tory sequences and the quantification of seven artic-ulatory parameters. Considering the large set of images collected, the quantitative results are presented separately for each sound utterance and for each subject. The analysis was based on the mean values and standard deviations of the seven distances extracted among the subjects, from each image sets that best represent each sound. The results of the first articulatory context for each subject are presented in Tables 2 and 3. Table 2. Articulatory measurements of the first articulatory context for the female subject 1.

Table 3. Articulatory measurements of the first articulatory context for the female subject 2.

Comparing both subjects, higher distances are demonstrated concerning pharynx width (D5) during the utterance of the vowel [i] and in generally simi-lar measurements can be observed. Except for the open-mid vowels [ɛ] and [ɔ], the global vocal tract’s shape (based on the distance trajectory) and distanc-es are fairly different among the subjects. The results indicated in Tables 4 and 5 represent the measurements extracted from the images collected for the second articulatory context, the word [patu]. Comparing the several measurements obtained for each uttered sound, similar values can be observed for both subjects. Rather different measures are more frequent for each plosive consonant, namely the distances D1 to D3 to the consonant [t] and D2, D5 and D6 to the consonant [p].

Page 5: Speech Articulation Assessment using Dynamic Magnetic

Table 4. Articulatory measurements of the second articulatory context for the female subject 1.

Table 5. Articulatory measurements of the second articulatory context for the female subject 2.

Considering the higher occurrence of the target sound [a] (in times per sequence) in both articulatory contexts, a cross-parameter comparison was also performed in order to assess the effects of VV and CV articulation context, Figure 5.

Figure 5. Comparative articulatory measurements performed for each female subject considering the target sound [a] in two different articulatory contexts: VV in sequence 1 and CV in se-quence 2. As can be observed in Figure 5, in general, the val-ues of the distances measured are fairly similar; however, a possible effect of the CV articulatory context in the distances measured, which are clearly lower for both subjects, can be seen.

4 CONCLUSIONS We have compared the speech articulatory measurements attained in a large set of midsagittal images, acquired during speech production in a very reasonable acquisition time and with enough image resolution for analysis and subsequent quantitative assessment. Each sound under analysis occurred at least 13 times per sequence during 48 seconds. The major draw-back was revealed during the visual assessment of the images for the assemblage of images to each tar-get sound. By using deformable models in the image segmenta-tion step, we improve the time spend in the process of measurements, as it passes from manual to auto-matic labelling and, instead of manually labelling 400 images, only 80 were manually annotated. In line with the acoustic sounds duration, the sound [a] was imaged more times, and the associated measurements show a more linear distribution of all distances (from lips to glottis). Higher discrepancies of values have been encoun-tered in open-mid vowels, which means the tongue is positioned halfway between a closed vowel (e.g. the vowels [i] and [u]) and an open vowel (e.g. the vowel [a]). This could be related with some errors during the visual assessment of the vocal tract’s shape for the assemblage of images to each target sound. In the word [patu], major differences in the distances extracted were traced for both consonants between the subjects. This can be possible due to the charac-teristic short time duration of both plosive articula-tions and result of some measurement errors due to the difficulty of the automatic tracing of the land-mark points concerning the lips and alveolar region. Comparing with our previous works (Ventura et al. 2011 and Vasconcelos et al. 2010a) an improvement on the image acquisition protocol was achieved and more data concerning speech production using MRI was addressed. Hence, a large set of target sounds and articulatory parameters have been analyzed in two different articulatory contexts. Additionally, the statistical deformable models are now used to auto-matically extract the landmarks of the MR images and thus, to reduce the time employed to extract the trajectory distances of the vocal tract’s shape under study. Articulatory phonetics has long been a discipline with mainly qualitative analysis methods, but in the last decades the advances of MRI allowed the quan-tification of several articulatory parameters. The knowledge obtained in this study represents a direct contribution to the improvement of speech synthesis algorithms considering the articulatory pa-rameters measured, and can thereby allow novel perceptions about dynamic behavior of the articula-

Page 6: Speech Articulation Assessment using Dynamic Magnetic

tors and co-articulation, namely on European Portu-guese speech language. In addition, these results could encourage research-ers and give useful functional and non-invasive in-formation concerning sounds articulation to be used in clinical practice. In the future, an arrangement for simultaneous re-cording of speech and MRI of the vocal tract’s shape will improve the accuracy of the results, namely for image-acoustic correlation of the target sounds.

5 ACKNOWLEDGMENTS

The images considered in this work were acquired at the Radiology Department of Hospital S. João EPE, in Portugal, and we would like to express our grati-tude to the technical staff. The first author would like to thank the support and contribution of the PhD grant from Escola Superior de Tecnologia da Saúde (ESTSP) and Instituto Politécnico do Porto (IPP), in Portugal. The second author would like to thank the support of the PhD grant from Fundação para a Ciência e Tecnologia (FCT), with reference SFRH/BD/28817/2006. This work was partially done in the scope of the pro-ject “Methodologies to Analyze Organs from Com-plex Medical Images – Applications to Female Pel-vic Cavity”, with reference PTDC/EEA-CRO/103320/2008, financially supported by FCT. 6 REFERENCES Avila-García, M.S., Carter, J.N. & Damper, R.I. 2004. Extract-

ing Tongue Shape Dynamics from Magnetic Resonance Image Sequences. Transactions on Engineering, Computing and Technology 2: 288-291.

Cootes, T. F., Taylor, C. J., Cooper, D. H. & Graham, J. 1992. Training models of shape from sets of examples in Proceed-ings of the British Machine Vision Conference, Leeds: 9-18.

Cootes, T.F. & Taylor, C.J. (1993). Active Shape Model Search using Local Grey-Level Models: A Quantitative Evaluation in British Machine Vision Conference, Guildford: BMVA Press.

Cootes, T.F. & Edwards, G. (1998). Active Appearance Models. European Conference on Computer Vision, Freiburg, Germany.

Cootes, T.F. Build_aam. 2004; Available at: http://www.wiau.man.ac.uk/~bim/software/am_tools_doc/download_win.html.

Demolin, D., Sampaio, A. & Metens, T. 2006. An MRI Study of Articulatory Compensation. In 7th International Seminar on Speech Production, Brazil: Ubatuba, São Paulo.

Echternach, M., Sundberg, J., Arndt, S., Markl, M., Schuma-cher, S. & Richter, B. 2010. Vocal Tract in Female Regis-ters - A Dynamic Real-Time MRI Study. Journal of Voice 24(2): 133-139.

Engwall, O. 2004. From real-time MRI to 3D tongue move-ments. In 8th International Conference on Spoken Lan-

guage Processing (INTERSPEECH 2004 - ICSLP). Korea: Jeju Island.

Harshman, R.A., Ladefoged, P. & Golstein, L., 1977. Factor analysis of tongue shapes. Journal of the Acoustical Society of America (62): 693-707.

Kim, Y., Narayanan, S.S. & Nayak, K.S. 2009. Accelerated Three-Dimensional Upper Airway MRI Using Compressed Sensing. Magnetic Resonance in Medicine 61: 1434-1440.

Mády, K., Sader, R., Zimmermann, A., Hoole, P., Beer, A., Zei-lhofer, H. & Hannig, CH. 2001. Use of real-time MRI in assessment of consonant articulation before and after tongue surgery and tongue reconstruction. In 4th Interna-tional Speech Motor Conference. Netherlands: Nijmegen.

Maeda, S., 1988. Improved articulatory models. Journal of the Acoustical Society of America (84 S1): S146-S146.

Narayanan, S., Nayak, K., Lee, S., Sethy, A., & Byrd, D. 2004. An Approach to Real-time Magnetic Resonance Imaging for Speech Production. Journal Acoustical Society of Amer-ica 115(4): 1771-76.

Parthasarathy, V., Prince, Jl., Stone, M., Murano, Ez. & Nessaiver, M. 2007. Measuring tongue motion from tagged cine-MRI using harmonic phase (HARP) processing. Jour-nal Acoustical Society of America 121 (1): 491-504.

Shadle, C.H., Mohammad, M., Carter, J.N. & Jackson, P.J.B. 1999. Multi-planar Dynamic Magnetic Resonance Imaging: New Tools for Speech Research. In International Congress of Phonetics Sciences (ICPhS99). USA: San Francisco.

Stone, M., Cheng, Y. & Lundberg, A., 1997. Using principal component analysis of tongue surface shapes to distinguish among vowels and speakers. Journal of the Acoustical Society of America (101-5): 3176-3177.

Stone, M., Davis, E., Douglas, A.S., NessAiver, M., Gul-lapalli, R., Levine, W.S. & Lundberg, A. 2001. Modeling the motion of the internal tongue from tagged cine-MRI images. Journal of the Acoustical Society of America 109 (6): 2974-2982.

Stone, M., Davis, E., Douglas, A.S., & NessAiver, M. 2001. Modeling tongue surface contours from cine-MRI images. Journal of Speech, Language, Hearing Research 410: 1-40.

Vasconcelos, M.J.M. & Tavares, J.M.R.S. (2008). Methods to Automatically Built Point Distribution Models for Objects like Hand Palms and Faces Represented in Images. Com-puter Modeling in Engineering and Sciences, 36(3), 213-241.

Vasconcelos, M.J., Ventura, S.R., Freitas, D.R.S. & João Ma-nuel R S Tavares. 2010a. Towards the Automatic Study of the Vocal Tract from Magnetic Resonance Images. Journal of Voice: in press.

Vasconcelos, M.J., Ventura, S.R., Freitas, D.R.S. & João Ma-nuel R S Tavares. 2010b. Using Statistical Deformable Models to Reconstruct Vocal Tract Shape from Magnetic Resonance Images. Proceedings of the Institution of Me-chanical Engineers, Part H: Journal of Engineering in Medicine 224(10): 1153-1163.

Ventura, S.R., Freitas, D.R. & João Manuel R S Tavares. 2011. Toward Dynamic Magnetic Resonance Imaging of the Vo-cal Tract During Speech Production. Journal of Voice 25(4): 511-518.