linking vowel height and creaky voicestudents.washington.edu/lpanfili/nwlc_panfili_2016.pdf ·...

27
Linking Vowel Height and Creaky Voice Laura M. Panfili - [email protected] The University of Washington April 23, 2016 Research Q & Hypotheses – Background – Methods – Results – Discussion – Conclusion 1/20

Upload: lyhuong

Post on 25-Mar-2018

229 views

Category:

Documents


4 download

TRANSCRIPT

Linking Vowel Height and Creaky Voice

Laura M. Panfili - [email protected] The University of Washington

April 23, 2016

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion1/20

Research Questions and Hypotheses

•  Is creaky voice more likely to occur on low vowels than on high vowels in English?

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion2/20

High à

Lowà

Background – Phonation •  Phonation: the process of using air pressure to

set the vocal folds into vibration

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion3/20

Background – Phonation Types •  The phonation continuum (Gordon and Ladefoged, 2001):

Linking Vowel Height and Creaky Voice

Laura M. Panfili

Spring 2015

1 Introduction

Phonation has been well-studied in languages that use it phonemically (such as Jalapa Mazatec(Esposito, 2012), Hainan Cham (Thurgood, 2015) and Montana Salish (Flemming et al., 1994)).However, relatively little is known about its acoustic properties and uses in English. This studyexamines one aspect of the acoustics of phonation in investigating whether creaky voice is morelikely to occur on low vowels than on high vowels.

Previous studies have observed patterns regarding phonation types and vowel qualities (Podesvaet al., 2015; Szakay, 2012), though findings have been varied and side observations rather thanmethodically researched questions. The present study aims to study one acoustic aspect of phonationuse in English and hypothesizes that creaky voice is more likely to occur on low vowels than on highvowels. This pattern is demonstrated in a corpus of spontaneous conversations in Pacific NorthwestEnglish and is theoretically linked to Intrinsic Fundamental Frequency (IF0). Though further studyand the inclusion of more dialects and languages is needed, the results of this study potentially havesignificant implications for experimental design in studies of phonation, as well as for the discussionof IF0 and its mechanisms.

2 Background

2.1 Phonation

Phonation is the process of using air pressure to set the vocal folds into vibration, producing a quasi-periodic sound wave. We are able to manipulate our vocal folds in various ways - their thickness,length, and separation - using the muscles around them. These manipulations change the quality ofthe sound produced (Raphael et al., 2007). Gordon and Ladefoged (2001) describe a continuum ofphonation, ranging from spread vocal folds to closed vocal folds and with different types of vibrationin between, as seen in Figure 1. These laryngeal parameters and phonation types are overviewed inin Section 2.1.1; they produce different acoustic properties, which are overviewed in Section 2.1.2.

Figure 1: The Phonation ContinuumAfter Gordon and Ladefoged (2001)

1

No vocal fold vibration

because vocal folds are spread

No vocal fold vibration because

vocal folds are closed

(glottal stop)

Vocal folds spend approximately equal

amounts of time open and closed,

maximum vibration Vocal folds spend more time open

than closed

Vocal folds spend more

time closed than open

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion4/20

Graphics by Dan McCloy

Voiced Sounds

Background – Phonation

•  Voice quality changes based on manipulation of vocal fold: (Raphael et al., 2007) – Thickness – Length – Separation

•  Phonation is used in different ways by different languages

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion5/20

Breathy

Modal

Creaky

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion6/20

“YEAH”

Methods – The Corpus •  ATAROS (Automatic Tagging and

Recognition of Stance) Corpus (Freeman, 2015)

•  Pairs of native PNW English speakers – Roughly matched for age – Matched or crossed for gender

•  Five collaborative tasks designed to elicit changes in stance

•  Recorded at the UW Phonetics Lab

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion7/20

Methods – The Sample

•  8 pairs (16 speakers) – 11 female, 5 male – 21 – 70 years old

•  “Budget Task” – Asked to work together to balance an

imaginary town budget – The final of five tasks -> most natural speech

•  ~95 minutes of conversation

Table 1: Gender and Age of Speakers

Dyad 1 F, 21 M, 24Dyad 2 F, 70 F, 68Dyad 3 M, 26 F, 24Dyad 4 M, 24 F, 23Dyad 5 F, 21 F, 27Dyad 6 F, 49 M, 49Dyad 8 F, 39 M, 38Dyad 9 F, 23 F, 19

3.2 Annotating Phonation

Conversations in the ATAROS corpus were previously manually transcribed and force-aligned. Fol-lowing the phone-level boundaries already indicated in the corpus, all stressed vowels (primary andsecondary stress, for a total of 13,834 vowels) were tagged for their phonation type.

Two raters trained in phonetics listened to all stressed vowels in the eight ATAROS dyads. Inorder to accurately represent what we hear as different phonation types (as opposed to the acousticproperties phoneticians are trained to recognize in spectrograms), the raters were instructed torely on their ears to make a judgement; the spectrogram, pitch track, intensity track, and formanttracks were all shut off and raters were discouraged from looking at the waveform. They weretrained by listening to multiple examples of vowels exhibiting the acoustic properties of each of thethree phonation types. Example waveforms and spectrograms for the three phonation types witheach of the four corner vowels are provided in Figure A.2 in the appendix. The two raters had goodinter-rater reliability (Cohen’s Kappa 0.85 overlapping on 12.07 % of the data).

Stressed vowels were given one of the five following tags:• B: Breathy• M: Modal• C: Creaky• 0: Flaw in recording or alignment (e.g. clipping)• 1: Something interesting but irrelevant or problematic (e.g. laugh-speech)

Two types of data were excluded from the analysis. First, vowels tagged as 0 or 1 were notincluded, as they either did not represent useable audio or any of the phonation types. Second,function words that tend to include reduced vowels (phonetic ‘stop words’) were excluded to ensurethat only stressed vowels were part of the analysis. A complete list of these phonetic stop wordscan be found in the appendix in Figure A.1. After removing vowels tagged as 0 or 1 and thosebelonging to function words, the remaining 7,605 vowels were included in the analysis.1

1This study did not control for position in utterance. Creaky voice is often found phrase-finally, but it seemsunlikely that all the low vowels and none of the high vowels included in this study were also found phrase-finally.

7

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion8/20

Methods – Tagging Phonation •  Vowels were tagged for phonation type based

on auditory judgments •  Two phonetically trained raters – Cohen’s Kappa 0.85

•  Tags: – B: Breathy – M: Modal – C: Creaky – 0: Flaw in the recording or alignment –  1: Something interesting but irrelevant or

problematic

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion9/20

Methods – Data in the Analysis

•  The final sample excludes: – Vowels tagged as 0 or 1 – Unstressed vowels – Reduced vowels – All but the four “corner vowels” /i u æ ɑ/

•  The analyzed data includes /i u æ ɑ/ tagged as B, M, C in stressed syllables

à 2,459 vowels

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion10/20

Results – Vowel Spaces •  Plotted vowels to verify that they are representative of a

typical PNW vowel space •  Modal tokens only for these vowel plots

4 Results and Analysis

4.1 Vowel Spaces

To verify that the speakers produced a set of vowels representative of a typical Pacific Northwestvowel space (as in Wassink, 2015 and Wright and Souza, 2012), particularly that high vowels arehigh and low vowels are low, the first and second formants of each speaker’s four corner vowels wereplotted. These plots were created using the formants for only modal tokens.2 All modal tokens of/æ/, /A/, /i/ and /u/, with formants measured at the midpoint, were included, for a total of 1,705vowels (see Table 2 for a complete breakdown of tokens of vowel qualities).

Figures 4 and 5 show the average vowel spaces across all male and female speakers, respectively.While there is significant overlap between the front and back vowels for both men and women, it isimportant to note that high and low vowels remain distinct, making this data useful for studying therelationship between vowel height and voice quality. Note that the considerable overlap between /i/and /u/ is typical of the Pacific Northwest, where /u/ is fronted (Wassink, 2015, Freeman, 2015). Avowel space for each speaker can be found in Figure A.3, and a Nearey2-normalized aggregate vowelspace for all speakers in Figure A.4 of the Appendix. Vowel spaces were normalized and producedusing the phonR package (McCloy, 2015) in R.

Figure 4: Average Vowel Space, Male Speakers

2The fundamental frequency of non-modal vowels is extremely difficult to accurately calculate, causing formantsto also be difficult to determine.

8

Figure 5: Average Vowel Space, Female Speakers

4.2 Vowel Quality and Voice Quality - Descriptive Results

To examine the relationship between phonation type and vowel quality, the frequency of each phona-tion type was calculated for each of the corner vowels. The descriptive results are summarized inTable 2 and illustrated in Figure 6, a stacked bar graph showing the relative frequencies of the threephonation types for the four corner vowels /æ/, /A/, /i/ and /u/. Of the 850 tokens of the low frontvowel /æ/, 7.41% were breathy, 59% were modal, and 34.59% were creaky. Of the 496 tokens ofthe low back vowel /A/, 5.24% were breathy, 68.35% were modal, and 26.41% were creaky. Of the698 tokens of the high front vowel /i/, 6.88% were breathy, 77.79% were modal, and 15.33% werecreaky. Of the 245 tokens of the high back vowel /u/, 3.37% were breathy, 79.52% were modal, and17.11% were creaky.

Table 2: Phonation Types, by Vowel Quality, Totals (Percentages)

Vowel Breathy Modal Creaky Totalæ 63 (7.41%) 493 (58%) 294 (34.59%) 850 (34.57%)A 26 (5.24 %) 339 (68.35%) 131 (26.41%) 496 (20.17%)i 48 (6.88 %) 543 (77.79 %) 107 (15.33 %) 698 (28.39%)u 14 (3.37 %) 330 (79.52 %) 71 (17.11 %) 415 (16.88%)Total 151 (6.14%) 1705 (69.33%) 603 (24.53%) 2459

9

Male Speakers Female Speakers

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion11/20

Results – Vowel Height and Creak

•  Low vowels are significantly more likely to

be creaky than high vowels •  (χ2(1, N = 2459) = 83.58, p < .001)

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion12/20

Vowel Height Breathy Modal Creaky Total Low 89 (6.6%) 832 (61.8%) 425 (31.6%) 1346 (54.7%) High 62 (5.6%) 873 (78.4%) 178 (16%) 1113 (45.3%) Total 151 (6.1%) 1705 (69.3%) 603 (24.5%) 2459

Results – Vowel Height and Creak

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion13/20

Creaky à

phonaGon

low high

Results - Gender •  Do women use creaky voice more than men?

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion14/20

Breathy Modal Creaky Total Male 22 (3.1%) 514 (72.1%) 177 (24.8%) 713 Female 129 (7.4%) 1191 (68.2%) 426 (24.4%) 1746

Results - Gender •  Do women use creaky voice more than men?

Women Men

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion15/20

phonaGon

Discussion

•  Intrinsic Fundamental Frequency (IF0): low vowels have a lower pitch than high vowels (Whalen and Levitt, 1995) – The tongue position required in high vowels

pulls on the larynx, increasing tension on the vocal folds à higher F0

– Creaky voice is produced with low longitudinal tension on the vocal folds, so it would be harder to achieve on high vowels

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion16/20

Conclusion •  Low vowels are more likely to be creaky

than high vowels in this corpus of PNW English

•  Men and women creak with the same frequency

•  Physiology may underpin this pattern – high vowels cause more vocal fold tension than low vowels, and creaky voice requires low tension

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion17/20

Future Directions

•  What about breathy voice? •  Does this pattern hold in other dialects or

languages? •  In languages that use phonation

contrastively, are creaky high vowels as frequent as creaky low vowels in the inventory? (Check back with me on this in a few weeks! J)

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion18/20

QUESTIONS?

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion19/20

References

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion20/20

Esling, J. H. (2006). Voice Quality. In Encyclopedia of Language and Linguistics, pages 470–474. Oxford: Elsevier Freeman, V. (2015). The Phonetics of Stance-Taking. PhD thesis, University of Washington. Gordon, M. and Ladefoged, P. (2001). Phonation types: a cross-linguistic overview. Journal of Phonetics, 29:283–406. Ladefoged, P. and Johnson, K. (2015). A Course in Phonetics. Wadsworth, 7 edition. Laver, J. (1980). The Phonetic Description of Voice Quality. Cambridge University Press, 1 edition. McCloy, D. (2015). phonR: tools for phoneticians and phonologists. R package version 1.0-3. Ohala, J. J. and Eukel, B. W. (1987). Explaining the intrinsic pitch of vowels. In Channon, R. and Shockey, L., editors, In honor of Ilse Lehiste, pages 207–215. Dordrecht. Raphael, L. J., Borden, G. J., and Harris, K. S. (2007). Speech Science Primer: Physiology, Acoustics, and Perception of Speech. Lippincott Williams & Wilkins, 5 edition. Wassink, A. (2015). Sociolinguistic patterns in Seattle English. Language Variation and Change, 27:31–58. Whalen, D. H. and Levitt, A. G. (1995). The universality of intrinsic f0 of vowels. Journal of Phonetics, 23:349–366.

EXTRA SLIDES

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion21/20

Phonetic Stop Words Excluded from Analysis

A Appendices

Figure A.1: Phonetic Stop Words

aaboutallamanandanyareasatbebeenbeforebeingbutbycancantcan’t

causecouldcuzdiddodoesdoingdontdon’tdunnoelseemceptfewforfromgetgetsgoing

gonnagotgottenhadhashavehaven’thaventhavinheherhershimhishowii’didif

i’miminisititsjustkkayletlet’sletslikelotmaymemyndof

onorouroursoutownrsheshouldsosomestillthatthatsthat’sthetheirtheirsthem

thentheretheresthere’sthesetheythisthosetiltilltouhusumverywannawantwantswas

wewellwentwerewhatwhenwherewhichwhilewhowillwithwouldyouyouryours

16

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion22/20

Phonation Type – Corner Vowels Figure 6: Phonation Type for Corner Vowels

No strong relationship emerges between vowel quality and breathy voice; front vowels seem tobe very slightly more frequently breathy than back vowels, though this difference appears trivial.More noticeable are differences in creaky voice - low vowels are more frequently creaky than highvowels. Because phonation appears to pattern by height, and the focus of this study is creaky voice,I will continue my analysis with data grouped into low (/æ/ & /A/) and high (/i/ & /u/) vowels.

4.3 Vowel Height and Voice Quality - Results

The results of Section 4.2 support collapsing /æ/ & /A/ into low vowels and /i/ & /u/ into highvowels, as they pattern similarly regarding voice quality. These two categories were submitted to achi square test of independence to compare the relationship between vowel height and phonation.

The descriptive statistics regarding vowel height and vowel quality are summarized in Table 3and illustrated in Figure 7. Of the 1346 tokens of low vowels, 6.61% were breathy, 61.81% weremodal, and 31.58% were creaky. Of the 1113 tokens of high vowels, 5.57% were breathy, 78.44%were modal, and 15.99% were creaky.

10

æɑiu

Figure 6: Phonation Type for Corner Vowels

No strong relationship emerges between vowel quality and breathy voice; front vowels seem tobe very slightly more frequently breathy than back vowels, though this difference appears trivial.More noticeable are differences in creaky voice - low vowels are more frequently creaky than highvowels. Because phonation appears to pattern by height, and the focus of this study is creaky voice,I will continue my analysis with data grouped into low (/æ/ & /A/) and high (/i/ & /u/) vowels.

4.3 Vowel Height and Voice Quality - Results

The results of Section 4.2 support collapsing /æ/ & /A/ into low vowels and /i/ & /u/ into highvowels, as they pattern similarly regarding voice quality. These two categories were submitted to achi square test of independence to compare the relationship between vowel height and phonation.

The descriptive statistics regarding vowel height and vowel quality are summarized in Table 3and illustrated in Figure 7. Of the 1346 tokens of low vowels, 6.61% were breathy, 61.81% weremodal, and 31.58% were creaky. Of the 1113 tokens of high vowels, 5.57% were breathy, 78.44%were modal, and 15.99% were creaky.

10

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion23/20

Figure A.5: Phonation Types by Vowel Height, For Each Speaker

20

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion24/20

Longitudinal Tension

•  “the degree of stretching force” on the vocal folds (Zemlin, 1998)

•  Controlled by the thyroarytenoid muscle •  Creaky voice involves low LT – Shorter vocal folds – More mass per unit length – Slower vibration à lower F0

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion25/20

Laryngeal Cartilages and Parameters

2.1.1 Physiology of Phonation Types

Laver (1980) describes three “laryngeal parameters” that, when combined in various permutations,produce different phonation types. These parameters are longitudinal tension, medial compression,and adductive tension; they are determined by actions of the muscles controlling the cartilagesaround the vocal folds - the thyroid, cricoid, arytenoid, and posterior criocoarytenoid cartilages.Figure 2 (after Laver p. 109) illustrates the cartilages and laryngeal parameters.

Figure 2: Laryngeal Cartilages and Parameters, after Laver

!

!

Spread Breathy Modal Creaky Closed Voiceless Voiceless

Thyroid Cartilage

Cricoid Cartilage

Arytenoid Cartilage

Longitudinal Tension

Medial Compression

Adductive Tension

Posterior Cricoarytenoid Cartilage

Medial compression is the amount of force bringing the vocal folds together at the midline. Thiscompression determines how much the vocal folds are approximated (Zemlin, 1998), and is controlledby various muscles. The lateral cricoarytenoid muscle (connecting the cricoid and arytenoid carti-lages) rotates the arytenoid cartilages, bringing them towards the midline. The arytenoid cartilagesare also brought together in adductive tension, caused by the interarytenoid muscles. These twoforces bringing the arytenoid cartilages together at one end of the glottis, combined with increasedtension in the thyroarytenoid muscle, increases medial compression.

Longitudinal tension is “the degree of stretching force” on the vocal folds (Zemlin, 1998). It iscontrolled by the thyroarytenoid muscle, which connects the thyroid and arytenoid cartilages. Whenunopposed, its contraction reduces longitudinal tension by shortening the vocal folds, causing themto have more mass per unit length and therefore to vibrate more slowly, resulting in a lower fun-damental frequency (Raphael et al., 2007). (However, when the contraction of the thyroarytenoidmuscle is opposed, vocal fold tension increases (Zemlin, 1998).) The cricothyroid muscle, connect-ing the cricoid and thyroid cartilages, also impacts the length of (and therefore tension on) thevocal folds; its contraction decreases the distance between the cricoid and thyroid cartilages, whichincreases longitudinal tension (Zemlin, 1998).

These three forces on the vocal folds - medial compression, adductive tension, and longitudinaltension - work together in various ways to produce the full range of phonation types seen in Figure1. The following is an overview of the laryngeal settings involved in each phonation type.

2

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion26/20

ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion27/20

Vowel Space, All Speakers, Nearey 2-Normalized

Figure A.4: Average Vowel Space, Across All Speakers(Nearey 2 Normalized)

19