three examples of sound-system research using web-available materials andy wedel lsa summer...

87
Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Upload: alexis-sullivan

Post on 13-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Three examples of sound-system research

using web-available materials

Andy Wedel

LSA Summer Institute: The Data Goldmine

July 9, 2015

Page 2: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

1. Functional load and diachronic phoneme inventory change– Published literature on sound-change in

combination with phonemically-coded corpora

2. Lexical competition and hyperarticulation in natural speech– Phonetic measures in the Buckeye Corpus in

combination with lexical data on English

3. Correlation between crosslinguistic and language-internal phoneme frequencies– A database of phoneme inventories combined

with available phonemically-coded corpora

Page 3: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Organizational steps in research

• What is the question? – Identify your general hypothesis

• What is the approach?– Operationalize your hypothesis– Develop a method/experiment

• Find data/create materials• Analysis/Results• Dissemination

Page 4: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

1. Functional load and diachronic phoneme inventory change

With: Abby KaplanDepartment of Linguistics

University of Utah

Scott JacksonCenter for the Advanced Study

of Language (CASL)

University of Maryland

Page 5: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Functional load and diachronic phoneme inventory change

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 6: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

6

Phoneme inventories change over time

Page 7: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

– Gilliéron (1918), Jakobson (1931), Mathesius (1931), Trubetzkoy (1939)

– Martinet (1952), King (1967), Hockett (1967)

– Surendran & Niyogi (2006), Silverman (2011), Kaplan (2011)

Does the functional load of a phoneme contrast influence

its trajectory of change?

Page 8: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Functional load

“The notion of functional load is that a phonemic system … has a (quantifiable) job to do, and that the contrast between any two phonemes, say /a/ and /b/, carries its share.” Charles Hockett 1967

8

Page 9: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Functional load

Specific Hypothesis: Neutralization is less likely for contrasts

that have a higher functional load. (Martinet 1955, Hockett 1967)

9

Page 10: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Phoneme Mergers

/ ɑ ~ ɔ / merger in western American English

cot

ɑ

ɔ caught

Page 11: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

How has functional load been operationalized?

• In terms of the lexicon:– Number of minimal pairs (Martinet 1955)

• Various ways of counting number of homophones (Silverman 2009, Kaplan in press)

– Lexical level entropy (Surendran and Niyogi 2006)• In terms of the sound system

– Type or token phoneme frequency (Currie-Hall 2010)

– Phoneme level entropy (Hockett 1967, King 1967, Surendran and Niyogi 2006)

Page 12: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Why hadn’t this been successfully tested before?

• Previous approaches involve case-studies:1. Find a contrast merger or set of mergers2. Assess the change in the system given your favorite

measure of functional load3. Compare to a set of similar contrasts that did not

merge. 4. Is the change in the system smaller for the actual

mergers than for the non-mergers?• Problem: if we assume that functional load is just

one of many factors influencing sound change, we expect many ‘exceptions’ to the hypothesis.

We need to assess outcomes statistically.12

Page 13: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Functional load and diachronic phoneme inventory change

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 14: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Strategy for dealing with data sparseness, diversity of data source

1. Pool data on mergers from multiple languages.

2. Use linear mixed effects modeling.– Random effects structure helps control for structure

inherent in different data-sources.

Page 15: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

What’s the balance between hypothesis generation and testing?

• Broad general hypothesis to be tested: – Functional load predicts merger

• Narrower hypotheses to be explored:– what specific measure(s) of functional load

are predictive?

Page 16: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Functional load and diachronic phoneme inventory change

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 17: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Building a database

• Hockett 1955 “... unfortunately the determination [of functional load] has not been made yet [because] the amount of counting and computation is formidable, so we can give no example ...”

Use existing frequency corpora to build a large database of reasonably recent mergers and associated comparison sets.

17

Page 18: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Find word lists from a variety of languages

• We don’t know what measure of functional load is appropriate: want to be able to test a variety of measures– Minimal pair count– Average neighborhood density– System entropy

• Requirements for each word list:– Phonemically coded– Lemmatized– Frequency

Page 19: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

• German• Dutch, RP English • American English• Spanish• French• Turkish• Korean• HK Cantonese

Find word lists from a variety of languages

Page 20: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Material won’t perfectly match your question

• Key!– Always keep your eyes open for new data sources.– Be ready to do some work to transform information

into a form appropriate for your question.– You’ll often have to make semi-arbitrary decisions

• Keep notes, and be ready to describe/defend your choices.

• Examples differing in ease:– Turkish > American English > Spanish

Page 21: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Turkish: easy to work with

• Obtained by emailing authors• Easy to work with:

– Orthographic coding already near-phonemic• coding is pre-merger

– Morphologically parsed into stem + affixes– Syntactic category given– ArisoyTurkishData– LemmaForms

Page 22: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

American English: moderately easy to work with

• Get standard US pronunciation from Carnegie-Mellon Pronouncing Dictionary (CMUDict)

• Frequency databases freely available– CELEX, SubtlexUS– How to deal with homographs?

• Example output files with ND calculated– LemmaForms

Page 23: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Spanish: More complex

• Spanish Gigaword corpus (Linguistics Data Consortium)– Text files from newswires– Example

• Use TreeTagger to morphologically parse and add categories

• Example of output

• Map to phonemic representation and count• Show code and output

Page 24: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Looking for changes of interest

• Look through the literature for diachronically recent phoneme mergers in varieties of these languages that share the same phonemic inventory as the dialect on which the word list is based. – For example:

• American and RP English have distinct vowel inventories;

• RP and Australian English share phoneme inventories, even though they are phonetically different.

Page 25: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Looking for changes of interest

• Identify a set of comparison phonemes of the same major class (consonant, vowel) as the merged phoneme pair that are phonologically similar.– 1 basic feature distant, e.g., t ~ d, t ~ k, u ~ o

Page 26: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

56 mergers524 non-mergers

8 languages

26

18 phoneme-pair systems: Each contains at least one merger, and as comparisons, all other phoneme pairs in the same major class (vowel or consonant) that are one phonological feature apart.

Wedel, A., A. Kaplan & S. Jackson (2013). Language and Speech.

Wedel, A., S. Jackson A. Kaplan (2013). Cognition.

Page 27: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Independent measures

• Lexical measures:– Number of minimal pairs distinguished by

each phoneme pair• Write a script that goes through each phonemic

form, merges the contrast using a regular expression, and counts how many other phonemic forms it becomes identical to.

– Lemma vs word-form counts– Within/across word category

Page 28: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Independent measures

• Lexical measures:– Number of lexical ‘prefixes’ distinguished by

each phoneme pair (Cohen-Priva, in press)– Average neighborhood density for words

containing each phoneme– Lexical entropy change on merger (Surendran

& Niyogi 2006)

Page 29: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

29

Calculating functional load in terms of informational entropy (Shannon 1951)

General form (Hockett 1967, Surendran and Niyogi 2006):

FL(a ↔ b) = H(L) − H(La↔b)

H(L)

where

Page 30: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Independent measures

• Sublexical measures:– Phoneme type/token frequencies

• uniphone, biphone, triphone

– Sublexical entropy change upon merger– Dataset example

Page 31: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Functional load and diachronic phoneme inventory change

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 32: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Number of minimal pairs is inversely correlated with merger

32Wedel, A., A. Kaplan & S. Jackson (2013). Language and Speech.

Page 33: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

What kind of minimal pairs?

Lemma vs word form?

Within vs Between Category?

Frequency?

Page 34: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

What does not seem to substitute for minimal pairs in this effect?

• Lexical measures– Neighborhood measures– Lexical entropy change

• Sublexical measures– sublexical entropy changes– uniphone, biphone, triphone probabilities

Page 35: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Intriguing: Higher phoneme frequency is positively correlated with merger

…but only for phoneme pairs that don’t distinguish minimal pairs.

Page 36: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Example model predictions

36

American English

Page 37: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

What about changes that might index avoidance of merger?

• Phoneme Shift: concerted shift of a phoneme pair in the same dimensional space.

• Phoneme Split: merger of a contrast associated with enhancement of an associated contrast in a different dimension.

Page 38: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Phoneme Shifts

- California Vowel shift

fat

u

ɑæ

ɪ

dude

dress

Page 39: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Phoneme Splits

– Vowel length split in Pittsburgh English• town ~ ton

taʊn ~ tʌn tʌ:n ~ tʌn

a

ʌ townton

ʊ

Page 40: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

What’s the balance between hypothesis generation and testing?

• We already have a strong prediction that a small number of within-category minimal lemma pairs predicts merger.

• Narrower hypothesis to be explored:– Shifts and splits…

• which are phoneme inventory changes that preserve lexical distinctions…

– are correlated with a significantly larger number within-category minimal lemma pairs.

Page 41: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Get examples of shifts/splits in our set of languages

• Shifts– Spanish voiced/voiceless stop pairs

• Lewis 2000

– American English vowel shifts: Northern cities, Southern Shift• Labov et al. 2006

– NZ English front vowel shifts• Hay, Macglagan, & Gordon 2008

– Polder Dutch diphthongs• Jacobi 2009

– Canadian French vowel shift• Walker 1983

Page 42: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Database of Shifts/Splits

• Splits– Pittsburgh /ɑʊ ~ ʌ/, Inland North /e ~ ɑ/ vowel length

• Labov et al, 2006– Turkish ɣ deletion vowel length

• Lewis 1967– NZE /dress ~ fleece/ diphthongization

• Maclagan and Hay, 2005– Korean onsets /lax ~ aspirated/ tone

• Silva 2006

Page 43: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Mergers versus Shifts and Splits

phoneme mergers

phoneme splits/shifts

Page 44: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Can we predict the direction of change?

• Given a phoneme-inventory change, was it – a change that reduces lexical distinctions?

a merger

– a change that preserves lexical distinctions? a shift or a split

Page 45: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Given a change, predicting its type

log minimal lemma pair count

MergerShift/Split

Page 46: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Individual datasets

Page 47: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

New insights• The distribution of a phonological contrast

across the lexicon influences the trajectory of change in that phonological contrast.

• Results in maintenance of a compact phoneme inventory.– Contrasts that support few lexical contrasts tend to

be lost.– Contrasts that support more lexical contrasts are

preserved, or provide seed variation for new contrasts.

Page 48: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Take home message with respect to big data and computation…

• New data sources, models and technologies allow us to better test hypotheses concerning the relationship of the form of sound systems to their function in communication.

Page 49: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

2. Lexical competition and hyperarticulation in natural speech

With: Becky SharpDepartment of Linguistics

University of Arizona

Page 50: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Lexical competition and hyperarticulation in natural speech

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 51: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Big question• If the existence of minimal pairs influences

change in a phoneme contrast, what are the mechanisms, at various levels?

• Theoretical Prediction:(e.g., Lindblom 1990, Wright 2004, Wedel 2012…)

Phonetic cues that support communication are hyperarticulated in usage.

Consistent usage biases drive phonological change and pattern formation.

Page 52: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Change in distribution of word variants

Change in distribution of sublexical variantsacross the lexicon

e.g., Baudouin de Courtenay 1895,Ohala 1989, Lindblom 1990,Bybee 2001, Blevins 2004, Baese-Berk &Goldrick 2009, Ernestus 2011, Wedel et al. in press

Wang 1969, Bybee 2002,Phillips 2006, Kraljic & Samuel 2009,Hay and Maclagan 2012

Theoretical/Linguistic/Experimental evidence:

ArticulationPerceptionCognitive biasesSocial factorsSystem-internal patternsAcquisition biases…

52

Bias toward accurate transmissionof lexical information

Biases onphonetic formof word tokens

Page 53: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

[Within-phoneme category variants]

Selection for word-level contrast

/Phoneme category evolution/

Page 54: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Lexical competition and hyperarticulation in natural speech

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 55: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Background/Previous Work

• Previous work done using lab speech• Small effects, fragile results

– VOT is slightly hyperarticulated for initial stops given minimal pairs in list reading (Baese & Goldrick 2009,

Peramunage et al. 2011).– VOT hyperarticulated on first production of words with

a visual stop-competitor in the context (Kirov & Wilson 2012)

– In a lab-speech paradigm designed to elicit hyperarticulation away from a vowel competitor, tense/lax vowel duration differences increased, but not formant differences (Schertz 2015)

Page 56: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Work with vowels has focused on ND as the trigger for hyperarticulation,

and dispersion as the outcome

• Dispersion = distance of a vowel in F1-F2 space from the center of the vowel space

• But: vowel change patterns suggest that competition-driven hyperarticulation should be more phonetically specific.– Correlation of minimal pair count with vowel

shift patterns link competition to shifts– Vowel chain shifts often involve moves toward

the center of the vowel space.

Page 57: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Dispersion as the outcome of competition makes the wrong prediction for vowel system change: Vowels can centralize in chain-shifts.

American Northern Cities Shift

Page 58: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Ok, so how to approach this?

1. Use natural speech instead of lab speech

2. Compare minimal pair existence to neighborhood density as a predictor for hyperarticulation

3. Look at both VOT for stops and F1-F2 Euclidean distance for vowels– For vowels, compare F1-F2 distance to

dispersion

Page 59: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Lexical competition and hyperarticulation in natural speech

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 60: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Use the Buckeye Corpus of Conversational Speech

• 40 one-hour sociolinguistic interviews• gender and age balanced• obtained in Columbus, Ohio in 2000• Densely annotated:

– Phonemic transcription– Phonetic transcription– Syntactic category– Textfiles, phonefiles

Page 61: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

VOT in word-initial stops

• Use a perlscript to identify appropriate material in the Buckeye Corpus – words starting with [ptkbdg]– content words– 1, 2 syllables– no preceding/following utterance or disfluency

boundary– no preceding word-final stop

• Measure closure and burst lengths

Page 62: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Stop length, burst and offset

62A pea A bee

burst

length

[p] [b]

Page 63: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

VOT data creation

• Annotate stop beginning, burst and offset using Praat.– Get lots of undergraduate helpers for this…

Page 64: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Praat example

Page 65: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Get dependent measure

• Script that processes Praat textgrid textfile to obtain:– Stop length, burst length– Use burst/length ratio as a rate-normalized

measure of VOT (Yao 2007)

Page 66: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Get independent factors of interest

• Minimal pair existence– Carnegie Mellon Pronouncing Dictionary

• Neighborhood density– Calculate independently– IPhOD

Page 67: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Lexical competition and hyperarticulation in natural speech

1. What is the question?

2. What is the approach?

3. Find data/create materials

4. Analysis/Results

Page 68: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Voiced Stops Voiceless Stops No MinPairs MinPairs No MinPairs MinPairs

Burst/length ratio by minimal pair existence:Initial stops distinct in voice

bat

pat

badge

pant

Page 69: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Relationship of Neighborhood Density to Burst/length Ratio

Lexical Neighbors

Bur

st/le

ngth

rat

io

Page 70: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Factors in Linear Mixed Effects modeling

• Stop-voicing minimal pair competitor existence • Neighborhood density• Control factors:

– local speech rate– word category– forward/backward bigram probabilities– word frequency– previous mention– syllable number– stop identity– following high (liquids, rhotics, high vowels)

Page 71: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Get control factors

• From Buckeye word files:– Word identity

• previous, target, and following word

– Word category– Previous Mention– Speech rate

• syllables per second in local utterance

Page 72: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Get control factors

• From corpora, get forward/backward bigram probability– Google n-gram– Fisher English Training set

• see Seyfarth 2014 for example

Page 73: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Get control factors

• From IPhOD:– SubtlexUS-based word frequency– Neighborhood density– Positional two-segment probability averaged

over the word (Vitevitch & Luce 2004)

Page 74: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Voiceless Stop Model

Page 75: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Voiced Stop Model

Page 76: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Can we find this effect in vowels? Measuring vowel-vowel distances

Page 77: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Vowel distances in initial syllables

• Identify material in the Buckeye corpus of Conversational English– words with an initial syllable non-back

monophthong– content words– 1 syllable– no preceding/following utterance or disfluency

boundary– no words with ablaut in their paradigm

• e.g., no ‘sit’, because of ‘sat’.

Page 78: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Dataset construction

For each word token, measure vowel distance to three neighboring vowels.

Starting dataset has three measures per word token:

Split randomly into three datasets with one measure per token. Randomly choose one dataset for statistical analysis.

Minimal Pair existence

Page 79: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Measuring from [i]

æʌ

Page 80: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Measuring from [ɛ]

e

ɛ

æʌ

Page 81: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Minimal pair existsMinimal pair does not exist

more distinctive

Page 82: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Factors

• Vowel-vowel minimal pair competitor existence • Neighborhood density• Vowel-vowel minimal pair competitor existence in

one of the other two neighboring vowels• Control factors:

– local speech rate– forward/backward bigram probabilities– word frequency– previous mention– vowel length– vowel-vowel pair identity

Page 83: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Measuring from [ɛ]

æʌ

Page 84: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

LME modelmodel = lmer (EuclideanDistance ~

MinimalPair+

Neighborhood+

Alternative +

VowelLength +

Vowel_CompetitorVowel +

(1+ MinimalPair+Neighborhood+Alternative+VowelLength|Speaker) + (1|Lemma), data = k, REML = F)

Page 85: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Model output

Page 86: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

What about dispersion?

• Run the same kind of analysis using vowel-center distance.

• Factors that significantly predict dispersion:– Word Frequency– Vowel Length

• Neighborhood density and minimal pair competitor existence are not predictive.

Page 87: Three examples of sound-system research using web-available materials Andy Wedel LSA Summer Institute: The Data Goldmine July 9, 2015

Summary

• Phonetic cues that contribute strongly to distinguishing words tend to be hyperarticulated in natural speech– VOT in initial stops– F1-F2 distance in vowels

• Consistent with idea that phoneme contrast is maintained in part by a bias toward lexical contrast. maintains an efficient set of phoneme contrasts over language change: phonemes that do not distinguish many words are vulnerable to loss.